Solaris Real Stuff

645
Solaris, Sun Stuff Work Related Sun / Solaris Stuff that I pick up from newsgroups and web sites Friday, November 18, 2005 Backups Under Solaris Backups Under Solaris Other Backup Utilities In addition the basic Unix backup utilities, Solaris offers ufsdump and ufsrestore. These two commands function as a pair. ufsrestore can only restore from tapes created by ufsdump. Both are called from the command line and follow the syntax: ufsdump options arguments filenames ufsrestore options argument filenames ufsdump only copies data from a raw disk slice. It does not copy free blocks. If a directory contains a symbolic link that points to a file on another disk slice, the link itself is copied. When ufsdump is used with the u option, the /etc/dumpdates file is updated. This file keeps a record of when filesystems were last backed up, including the level of the last backup, day, date and time. ufsdump can be used to back up individual files and directories as well as entire filesystems.

description

Solaris Real Stuff

Transcript of Solaris Real Stuff

Page 1: Solaris Real Stuff

Solaris, Sun Stuff Work Related Sun / Solaris Stuff that I pick up from newsgroups and web sites

Friday, November 18, 2005

Backups Under Solaris

Backups Under SolarisOther Backup Utilities

In addition the basic Unix backup utilities, Solaris offers ufsdumpand ufsrestore. These two commands function as a pair. ufsrestore canonly restore from tapes created by ufsdump. Both are called from thecommand line and follow the syntax:

ufsdump options arguments filenames

ufsrestore options argument filenames

ufsdump only copies data from a raw disk slice. It does not copy freeblocks. If a directory contains a symbolic link that points to a fileon another disk slice, the link itself is copied. When ufsdump is usedwith the u option, the /etc/dumpdates file is updated. This file keepsa record of when filesystems were last backed up, including the levelof the last backup, day, date and time. ufsdump can be used to back upindividual files and directories as well as entire filesystems.

If ufsdump is run without options or arguments the following defaultsare assumed:

ufsdump 9uf /dev/rmt/0 filenames

This means that ufsdump will create a level 9 incremental of thespecified file, update /etc/dumpdates, and dump the files to/dev/rmt/0.

ufsdump also supports an S option to estimate the amount of space, inbytes, that the backup will require before actually creating it. Thisnumber can then be divided by the capacity of the tape to determinehow many tapes the backup will need.

A series of tape characteristics and options can also be specified.

Page 2: Solaris Real Stuff

* c=cartridge* d=density* s=size* t=number of tracks

These options can be given in any order as long as the arguments thatfollow are in the same order, i.e.

ufsdump cdst 1000 425 9

This specifies a cartridge tape with a density of 1000, 425MB, andnine tracks. In terms of tape options the syntax is as follows:

ufsdump 9uf /dev/rmtXAbn filenames

Where

* X=the number of the drive beginning with 0.* A=optional density.o l=lowo m=mediumo h=higho u=ultrao c=compressed* b=specifies Berkeley SunOS 4.X compatibility.* n=no rewind option, which allows other files to be appended to the tape.

ufsrestore has an interactive mode which can be used to selectindividual files and directories for restoration. It also supports anoption to read the table of contents from the archive file instead ofthe backup media.

Limits of ufsdump:

* It does not automatically calculate the number of tapes neededto backup a filesystem.* It does not have a built in error checking mechanism.* It does not enable the backing up of files that are remotelymounted from a server.

Solaris also supplies volcopy, a utility to make an image or literalcopy of a file system.Tips and Quirks

The Solaris version of tar includes extra options. The -I optionallows a list of files and directories that are backed up to be put

Page 3: Solaris Real Stuff

into a text file. The -X option allows an exclusion file to bespecified that lists the names of files and directories that should beskipped.

The Solaris version of mt supports an asf subcommand which moves thetape to the nth file. n being the number of the file.

posted by Brahma at 4:11 PM 0 comments

Friday, November 04, 2005

QLC + QLA

QLC + QLA

Hi all.Sorry for my english.I have E4800 server, T3 array and SAN. My server have 2 FC adapters.One for T3, other for SAN (HP EVA5000, QLA2300). After reboot I see:qlc driver trying attach to both FC adapters, but cannot attach toQLA2300. I installed driver qla2300 from HP.com, but not initialized.Always qlc driver trying to attach to FCA 2300. What do I need to do ?

Re: QLC + QLA

Maybe you must look and configure the qla.conf.The best is, you look here for answers:

http://www.qlogic.com/knowledgecenter/

Re: QLC + QLA

The qlc driver can not attach to the QLA2300 card because the QLA2300card does not with with the QLC driver. This is normal.

However, you should be able to install the qla driver and run bothdrivers at the same time.

Re: QLC + QLA

The first thing to check is who's HBA each is by looking at the PCI identifiers.

Look at the output of the "prtconf -vpD" command.

Page 4: Solaris Real Stuff

Each HBA will have two lines that are important. The first to appearis "compatible:" and the second is "name:"compatible: 'pci1077,2.1077.2.5' + 'pci1077,2.1077.2' + 'pci1077,2' + 'pci1077,2200.5' + 'pci1077,2200' + 'pciclass,10000' + 'pciclass,0000'###SNIP###name: 'QLGC,qla'

Now the OS will then look for a "name" listed in the/etc/driver_aliases to match a driver to the HBA. If a "name" is notfound the OS starts using each of the compatible entries and willmatch drivers to those entries.

What you could do is run the following commands and post their outputhere and I'll tell you what's wrong:

prtconf -vpD | grep 1077grep ql /etc/driver_aliases

Chesapeake

Re: QLC + QLA

Did you ever get a resolution for this? I have almost the exact samesituation and have been unable to get the QLA driver to attach to the2300.

Re: QLC + QLA

402) root@cohuxfs27:/etc/cfg/fp> prtconf -vpD | grep 1077compatible: 'pci1077,2300.1077.106.1' + 'pci1077,2300.1077.106' +'pci1077,106' + 'pci1077,2300.1' + 'pci1077,2300' + 'pciclass,0c0400'+ 'pciclass,0c04'subsystem-vendor-id: 00001077vendor-id: 00001077compatible: 'pci1077,2200.1077.4082.5' + 'pci1077,2200.1077.4082' +'pci1077,4082' + 'pci1077,2200.5' + 'pci1077,2200' + 'pciclass,010000'+ 'pciclass,0100'subsystem-vendor-id: 00001077vendor-id: 00001077compatible: 'pci1077,2200.1077.4082.5' + 'pci1077,2200.1077.4082' +'pci1077,4082' + 'pci1077,2200.5' + 'pci1077,2200' + 'pciclass,010000'+ 'pciclass,0100'subsystem-vendor-id: 00001077vendor-id: 00001077compatible: 'pci1077,2200.5' + 'pci1077,2200' + 'pciclass,010000' +'pciclass,0100'

Page 5: Solaris Real Stuff

vendor-id: 00001077(403) root@cohuxfs27:/etc/cfg/fp> grep ql /etc/driver_aliasesqlc "pci1077,2200"qlc "pci1077,2300"qlc "pci1077,2312"qla2300 "pci1077,9"qla2300 "pci1077,100"qla2300 "pci1077,101"qla2300 "pci1077,102"qla2300 "pci1077,103"qla2300 "pci1077,104"qla2300 "pci1077,105"qla2300 "pci1077,109"qla2300 "pci1077,116"qla2300 "pci1077,115"

Re: QLC + QLA

I extracted the file adapter.properties from the QLogic SCLI utility.It has an index of the identities of the QLogic cards. That first oneof 1077,106 is a Sun Amber-2 X6767, not an HP card as you thought.

The two 1077,4082 cards are Sun Amber HBA ports.

Lastly the 1077,2200,5 is probably either a generic qlogic card or afibre down if this is a SB100/280R...

posted by Brahma at 4:58 PM 0 comments

how many file descriptors does the Xnewt process have open?

> When this happens, how many file descriptors does the Xnewt> process have open? ('ls -l /proc/<pid>/fd' or something> similar.)>

With ls -l /proc/<pid>/fd | wc -l

227 for a fresh GNOME session and one gnome-terminal window open222 for a fresh XFCE4 session and one xterminal window open

(different PIDs for each of these checks)

A given process is:

Page 6: Solaris Real Stuff

root 10091 9074 2 12:09 ? 00:00:00 /usr/X11R6/bin/Xnewt :26-auth /var/lib/wdm/authdir/authfiles/A:26-AY24Hr -dpms

This is interesting, because here is some of the 'ls -l /proc/10091/fd'output.

lrwx------ 1 root root 64 Mar 16 12:11 10 ->/var/lib/wdm/authdir/authfiles/A:12-d8mxzX (deleted)lrwx------ 1 root root 64 Mar 16 12:11 100 ->/var/lib/wdm/authdir/authfiles/A:11-QHkbcU (deleted)lrwx------ 1 root root 64 Mar 16 12:11 101 ->/var/lib/wdm/authdir/authfiles/A:11-bbVcHq (deleted)lrwx------ 1 root root 64 Mar 16 12:11 102 ->/var/lib/wdm/authdir/authfiles/A:27-FVil3v (deleted)lrwx------ 1 root root 64 Mar 16 12:11 103 ->/var/lib/wdm/authdir/authfiles/A:9-1iqtPQ (deleted)lrwx------ 1 root root 64 Mar 16 12:11 104 ->/var/lib/wdm/authdir/authfiles/A:28-ZcO3OP (deleted)lrwx------ 1 root root 64 Mar 16 12:11 105 ->/var/lib/wdm/authdir/authfiles/A:11-65HZQD (deleted)lrwx------ 1 root root 64 Mar 16 12:11 106 ->/var/lib/wdm/authdir/authfiles/A:16-J7pYyu (deleted)lrwx------ 1 root root 64 Mar 16 12:11 107 ->/var/lib/wdm/authdir/authfiles/A:11-RYJlqJ (deleted)

posted by Brahma at 4:56 PM 0 comments

Friday, October 28, 2005

httpd processes each with a size of 148M... is the top "size" display directly related to memory

> When I run top on this box, I can see 6 httpd processes each with a> size of 148M... is the top "size" display directly related to memory?

Yes, it's the amount of virtual memory used by this process. You shouldsee a similar number (with much greater detail) by doing pmap -x on theprocess.

> If it is, how can I possibly be running 6x148M processes just on apache> alone?

Every page used by the process is not necessarily private to thatprocess. Read-only portions of the Apache binary may be shared amongall 6, and system libraries (like libC) may be shared by many programs.

Page 7: Solaris Real Stuff

The 'pmap -x <pid>' output shows that more explicitly.

Then you might also want to know that there are even a lot morethat offer detailed information regarding system state and performance:e.g.sar, vmstat, iostat, trapstat, cpustat, mpstat, cputrack, busstat, kstat

Darren already gave a good answer, but I wanted to elaborate a little.(Or, after looking back on this after I've written the whole post,apparently more than a little...)

On a simple computer, there is just a certain amount of RAM availableand every address that a program uses (in a pointer, in an addressregister, or whatever) simply corresponds to part of that RAM. Andevery program executes in the same address space, which means a givenaddress refers to the same thing whether it's in the context of oneprocess or another.

But on Solaris, there is virtual memory, and every process has itsown address space. Using the MMU hardware, the system maps severaldifferent ranges of the process's address space to different things.Some of the address ranges are private areas that only the processitself has access to. Any memory you allocate with malloc() will bein a private address range. Some address ranges correspond to regionsof files on disk. (In Solaris, executables are loaded by setting upaddress ranges in the process's address space that correspond toparts of the executable file. And the same thing is done when anexecutable runs against a shared library.) Some address rangescorrespond to other things (sometimes even things like address rangesthat are used to communicate with hardware other than RAM).

Now to make things even a bit more complicated, just the existenceof an address range within a process's address space does not implythat any RAM is used for that range. For example, if an addressrange corresponds to a region of a file and if you've never eitherread from or written to any addresses in that range, Solaris doesn'tneed to waste time or memory putting that data into RAM. And tomake things yet more complicated, even if Solaris does need to useRAM for (part of) an address range, if two processes are using thesame region of the same file, Solaris can use the same RAM forboth processes, even if the addresses that would be used withinthe processes to access that data aren't the same addresses. Andto make things even yet more complicated, if a process has anaddress range that contains private data and that process does afork(), then the twin processes that result can both use the sameRAM (or swap space) for that data until the time when one of them

Page 8: Solaris Real Stuff

tries to change the data (at which point a copy must be made sothey have their own separate copies).

So, when you run top, and you see the "SIZE" of the process, whatyou're seeing is the size of all the address ranges that have beencreated for that process. Most (but not all) of these addressranges could correspond to data in RAM, but even if they do, itmight be data that is shared with another process. So when yousee 148M for an Apache process, that just means that there are 148Mworth of addresses that the Apache process could theoreticallyaccess if it wanted to.

The "RES" column of top's output is a lot closer to the RAM usageof a process, but it's still not exactly the same thing. The "RES"column just tells you, of all the addresses that a process couldpotentially access, how many of them are currently connected to aparticular spot in physical RAM. That is, how many of those pagesare resident in physical memory. It doesn't say how many of thoseare shared with other processes. It's tempting to say that the"RES" column tells you how much memory the process is using, butthat's not entirely accurate. It's totally possible for a processto be totally dormant and not running have its resident size increase.This could happen if the dormant process's address space refers tosome region of a file that some *other* process just accessed andthus forced into memory.

And in fact, this type of thing partly accounts for why you see suchhigh numbers, even in the "RES" column. You could have a processwhich only accesses a tiny portion of some file that's mapped intoits address space, but if a bunch of other processes also accesstiny portions of the same file, that will make increase the residentsize of all the processes that map the (whole) file bigger. Thatmay seem a little unlikely, but actually it is quite common becausethings like libc.so (the C shared library) are used by a bunch ofprocesses, and even though each process might only use a few functions,still when you count the total number of functions that are activelyused from that library, a significant portion of the library ends upbeing resident, and that means that it inflates the resident sizenumbers of all processes that use it.

> So I learned yesterday that the native stat tool for Solaris is> prstat.. So I'm guessing from your posting Logan that in prstat, the> SIZE column is the total amount of memory that each process can use and> the RSS is the actual amout thats used..

Page 9: Solaris Real Stuff

"can use" and "that's used" as a description of memory seem incorrect tome.

The difference is between Virtual Memory Pages that are actuallyresident in RAM (RSS) and those that are allocated (whether currently inRAM or not (SIZE). I don't think that "in use" and "in RAM" are quitethe same thing...

> "In a virtual memory(1) system, a process' resident set is that part of> a process' address space which is currently in main memory. If this> does not include all of the process' working set, the system may> thrash."> That makes sense to me except for one thing... If the SIZE is the total> amount, how come it fluctuates?

Processes allocate memory while they run. Most programs will grow, butnot shrink, but both are possible.

> Whoah.. I just looked at pmap.... thats insane.

Don't try to interpret everything. Most bits are just mappings fromother shared objects. In an average program, the most common place forit to consume memory directly is in the [heap].

However the breakdown of shared/private/RAM/total can be useful.

If I can't map SIZE and RSS to whats available and whats in use, how> can I tell when the memory needs to be upgraded?

I was making the distinction between "in use" (which is a little fuzzyfor me when talking about pages) and "in RAM" (which is well-defined).

You can map SIZE and RSS to what pages are in RAM at any one time, whichwill probably be all you need.

What "in use" means, and whether or not that has anything to do with RAMresidency is a different question.

--

The top part of the display of top shows memory usage. The 'swap -s' commandshows how much swap is in use. I think both of these include swap asbacked up by the executables and shared images on disk as well as theswap backed up by the swap file. The 'swap -l' command will show how muchof the actual swap space is in use. A rule of thumb is that when the amountof swap in use starts to be as big as your memory, you need to add memory.

Page 10: Solaris Real Stuff

If you have a lot of mostly inactive programs in use, you can allow moreswap to be used without hurting performance.

One way is to look at the pi and po (page in and page out) columnsin the vmstat output. Assuming you're not starting lts of new programs,high values here could indicate low memory. You should check outAdrian Cockcroft's book, Solaris Performance Tuning (aka the Porchebook).

Frequent complaints form users about poor performance can also be anidicator of too much paging. :-)

posted by Brahma at 2:14 PM 0 comments

Subject: V210 BGE0@1000FDX

Hi,

When connecting a server to a Gig interface you need to enable autonegon the server. It will pick up the correct speed automatically. Itbecomes a problem when trying to force it especially if using Ciscoswitches.

Normally I run the following script

ndd -set /dev/bge0 adv_autoneg_cap 1ndd -set /dev/bge0 adv_1000fdx_cap 1ndd -set /dev/bge0 adv_1000hdx_cap 0ndd -set /dev/bge0 adv_100fdx_cap 0ndd -set /dev/bge0 adv_100hdx_cap 0ndd -set /dev/bge0 adv_10fdx_cap 0ndd -set /dev/bge0 adv_10hdx_cap 0

Hope that it resolves your problem.

RegardsMusa

posted by Brahma at 2:12 PM 0 comments

Friday, October 21, 2005

SUMMARY: Tracking down system calls on Solaris 9

Subject: SUMMARY: Tracking down system calls on Solaris 9

Page 11: Solaris Real Stuff

Hi all,

Many thanks to everyone who responded - Aleksander Pavic, francisco, andDarren Dunham. My original email is attached below, along with thereplies I got - but to summarise : I was seeing a very high sysload on aSolaris 9 web server, and vmstat confirmed that a large number of systemcalls were being generated. I wanted to track these down and find outwhat was being called, but couldn't use Dtrace. Yet another argument formoving to Solaris 10 :)

As Darren said in his response: "The limitations on existing tools like'truss' are part of what drovedtrace, so I don't know that there's any magic out there."

He then went on to suggest I analyse all the Apache processes withtruss, send the output to a file and then analyse that. This was alsothe path suggested by Aleksander, who quite correctly pointed out thattruss can be made to follow any child processes generated via forking,so I could therefore truss the main Apache process and follow all it'schildren. He also suggested I send the output to a file, andpost-process it with awk or perl. Francisco also suggested the usefullsof tool to see what files are open, as my original hypothesis was thatthere were a large number of file handles being opened and closed.

In the end, I trussed every "httpd" process, and generated a summaryusing "truss -c". I let this run for 20 seconds, and saw that there werea very large number of resolvepath() and open() calls being generated,roughly half of these calls returned with an error.

I then narrowed my search down, and examined what was actually beingpassed as arguments to these calls. This is easily done with "truss -topen,resolvepath". It turns out that a huge number of theresolvepath()'s and open()'s were being generated by PHP scripts runningunder Apache. They were using an inefficient include_path, and so whenmost files were being included, PHP generated many resolvepath() andopen() calls which returned in error before finally finding the correctlocation of the file.

We fixed the PHP include_path and also modified some of the scripts touse an absolute path in include() or require() functions, and asexpected, the number of syscalls being generated halved itself.

There were a number of other code-related problems on that server aswell, but these were unrelated to my original request for help.

Once again, many thanks to those that responded. Problem resolved !

Page 12: Solaris Real Stuff

-Mark

posted by Brahma at 2:21 PM 0 comments

Configuring Qlogic HBA card

Subject: SUMMARY Configuring Qlogic HBA card

I found the solution. Thanks to those who replied!

At OBP:

ok> show-devs...<QLGC entry>...ok> select <QLGC entry>ok> show-children<Lists info about card such as WWN, LoopId Lun>ok> show-connection-modeCurrent HBA connection mode: 2 - Loop preferred,otherwise point-to-point

(Possible connection mode choices:0 - Loop Only1 - Point-to-point only2 - Loop preferred, otherwise point-to-point)

ok> set-connection-mode (0, 1, or 2)ok> show-data-rateCurrent HBA data rate: One Gigabit rate

Possible data rate choices:0 - One Gigabit rate1 - Two Gigabit rate2 - Auto-negotiated rate

ok> set-date-rate (0, 1, or 2)To set the data rate.

More info can be found athttp://download.qlogic.com/boot_code/23020/Readme.txt

John

Original Question:

Page 13: Solaris Real Stuff

I have a QLogic 2300 card in a SF V440. I am trying toinstall SecurePath but the card is not seen by Solaris8. (It is seen at OBP level)At my last job I had a similar problem which I fixedby setting the speed at the OBP level. *The card isset to autoneg but I need to force it to 1 gig)Unfortunately I could not keep my notes from that joband cannot forthe life of me remember how to set the speed.Has anyone got notes on how to do it. I have spent 2hours scouring google with no luck.Thanks,John

posted by Brahma at 2:20 PM 0 comments

How do I install Sun Explorer

4. How do I install Sun Explorer?

After downloading the SunExplorer.tar.Z file from sunsolve.sun.com:

cp SunExplorer.tar.Z /var/tmp

cd to /var/tmp

uncompress SunExplorer.tar.Z

tar xvf SunExplorer.tar

or, if you have gzip installed,

zcat SunExplorer.tar.Z | tar xvf -

This will extract the contents of the archive into two directories,SUNWexplo and SUNWexplu, located in the current directory.

As superuser, type the following command:

# pkgadd -d . SUNWexplo SUNWexplu

posted by Brahma at 2:19 PM 0 comments

Ethernet card woes

Subject: SUMMARY: Ethernet card woes

Page 14: Solaris Real Stuff

Thanks to everyone to replied!It turns out that /etc/inet/ipnodes had the old IP addressset - thanks to Dale Hirchert for pointing that out.Also if I ran sys-unconfig it would have caught it aswell -thanks Dominic Clark

So for future reference either change the following files :/etc/inet/hosts/etc/inet/netmask/etc/inet/ipnodes

or run sys-unconfig.

Thanks again!

Will

posted by Brahma at 2:19 PM 0 comments

read/ write performance on the volumes.

Hi Greg,

I can't offer any suggestions , but I am interested in knowing how youare measuring the read/ write performance on the volumes. HDS tool? orsomething more common.

Thanks, V

> Hello,

> Just a quick info gathering. I am at a customer site installing a new> HDS 9990. The high level config overview:

> HDS 9990 (Open V 40GB LUNS)> HDS 9200 (Legacy Array)> Sun Fire v880> Brocade 4100's (2 Fabrics)> QLogic 2GB Cards (375-3102) to new SAN> JNI FCE2-6412 cards to old HDS 9200 Array> MPXIO enabled and configured for Round Robin> VxVM 4.1> Oracle 9i

> During this phased implementation, we are in the date migration stage.> We are mirroring the old storage, which is from a HDS 9200 to the new

Page 15: Solaris Real Stuff

> LUNS on the TS (9990).> Once the mirroring is complete, we will break of the plexes from the> old array and be fully migrated to the new Hitachi.

> The customer decided not to break the mirrors yet. We have noticed a> decrease in write and read performance on all the volumes on the host.> I would expect a slight decrease in write performance, however, we are> seeing upto a 1/5 milli-second increase in read time as well on each of> the volumes. My assumption is that because of the double writes to two> different (types) of LUNS, that is impacting our reads.

> Suggestions?

Reply

Hi

I am using vxstat -g 'diskgroup' -i 1 (the -i is the interval I ampolling, in this case every one second). This output is giving me aformat like this:

OPERATIONS BLOCKS AVG TIME(ms)TYP NAME READ WRITE READ WRITE READ WRITE

vol ora00 39364 388 4931856 6208 1.8 0.1vol ora00 39585 389 4950704 6224 1.9 0.0vol ora00 39571 391 4954960 6256 1.8 0.1

As for Solaris LUN metrics, I generally use iostat -xnpz 1, which isgiving me the disk & tape I/O data and excluding any zero's. It's alot of information, so what I do is grep out what I am looking for, forexample, iostat -xnpz 1 | grep c5t0d0.

Thanks,

Greg

posted by Brahma at 2:17 PM 0 comments

cannot create /etc/foo: Operation not applicable

Subject: SUMMARY: cannot create /etc/foo: Operation not applicable

Original question:

Page 16: Solaris Real Stuff

> On a Solaris 8 system running fine for two months, I suddenly get this:>> # touch /etc/foo> touch: cannot create /etc/foo: Operation not applicable>> Truss says:> creat("/etc/foo", 0666) Err#89 ENOSYS>> I also noted truncated files in /etc.>> There is nothing interesting in the system log. System is a V210 running> Solaris 8 with recommended patches from feb. 28 2005. Root filesystem is> mirrored using SVM.

The responses I received include:

- Are you out of disk space- Are you out of inodes- Do you have the same problem on other partitions like /var or /opt- Are you running the automounter- Are the permissions wrong on /etc- Is the "touch" command malfunctioning- Is the root filesystem mounted read-only- Are you also unable to modify files in /etc- Does your metastat output show weird things- Do you already have a file name "foo" in /etc

The answer is "no" to all these points. So I requested downtime with thecustomer to bring the system into single-user mode to do a filesystem check.As expected, many errors showed up, but it was able to repair the rootfilesystem and the system is running fine now.

I also logged a case with Sun Support about this issue. They sent me twodocuments from SunSolve that describe common reasons for filesystemcorruption. Since the call is closed now I cannot retrieve the documentID's, sorry for that. The only two reasons that remain after reading thesedocuments are:

- Applications use the unlink(2) system call without checking if thedirectory is empty. This is a classical UNIX problem.- Bugs in the O.S.

I have no idea how to check if some of the running processes are misusingunlink(2). Maybe dtrace can do this, but this is a Solaris 8 system. As forbugs in the OS, I haven't found applicable ones on SunSolve.

Page 17: Solaris Real Stuff

Thanks to all who replied.

--Koef.

posted by Brahma at 2:14 PM 0 comments

Question about Sun patch: How to find out what patches I've just installed

Subject: SUMMARY: Question about Sun patch: How to find out whatpatches I've just installed

Many thanks to everybody who replied. You know whoyou are :-)

Original quesiton:I need to install a bunch of Sun patches into aSolaris 8 system. How do I find out the list of Sunpatches I just installed (using patchadd patch#)

Answers:1. #ls -ltr /var/sadm/patch|tail -xxwhere xx is the # of patches I've just installed(ie if I installed 20 patches, then the xx numberwill be 20; for example)

2. #showrev -p|egrep 'patch #1|patch #2|patch #3'etc..where patch #1,2,3 are the 3 patches I've justinstalled

"showrev -p" alone won't cut it (but otherwise istechnically correct) because the outputincludes too many previous patches. It will be kindof hard to verify which one.

posted by Brahma at 2:14 PM 0 comments

logical volume problem

logical volume problem

Hi all,I am using veritas VM 3.5. When I want to create a raid 5 volume by thefollowing command,

Page 18: Solaris Real Stuff

bash-2.05# vxassist -g diskgroup make vol-1 10m layout=raid5vxvm:vxassist: ERROR: Too few disks for striping; at least 4 disks areneededbash-2.05# vxdisk -g diskgroup listDEVICE TYPE DISK GROUP STATUSc2t1d0s2 sliced diskgro02 diskgroup onlinec2t2d0s2 sliced diskgro03 diskgroup onlinec2t3d0s2 sliced diskgro01 diskgroup online

I get the error "ERROR: Too few disks for striping; at least 4 disks".Raid 5 only need 3 disk. Why?

Reply

Because there are some interesting failure modes where a crash can occurin the middle of a write, leaving you not knowing if parity is right orwrong. Combined with a disk error, you can have problems.

To get around that failure mode, the default for a raid5 constructionwith vxassist is to create an additional log device which is not on anydisk shared with the raid5 data. It's small, but must be on a separatedisk. There might be a way of using the same disk but replicating it.

If you don't want the log disk, you can specify 'nolog' or 'noraid5log',then it will only need 3 columns.

vxassist -g diskgroup make vol-1 10m layout=raid5,nolog

log, nologCreates (or does not create) dirty region logs(for mirrored volumes) or log plexes (for RAID-5volumes) when creating a new volume. This attri-bute can be specified independently for mirroredand RAID-5 volumes with the raid5log and regionloglayout specifications. The current implementationdoes not support the creation of DCM logs in thelayout specification.

raid5log, noraid5logCreates (default) or does not create log plexesfor RAID-5 volumes.

posted by Brahma at 2:13 PM 0 comments

restored filesystem - comparison to original

Page 19: Solaris Real Stuff

restored filesystem - comparison to original

Having devised and operated a backup scheme and schedule since the start of themonth, I'd quite like to perform a restoration in order to test it.

I will restore the file system to a separate disk than the original,

But what's the "best" way to compare the two so I can be sure the scheme I havedevised is capable of backing up properly, but also my proposed restoremechanism, restores properly.

The fs in question is only 1GB at present. So any suggested comparison can betime consuming in nature.....

I'd obviously like to check for missing files/directories and errors withownerships, permissions, ACLs, and timestamps...

How do I go about this?

Cheers

> Rob

running tripwire on the orignal and the copy comes to mind.then compare the output tripwire databases.

> with ownerships, permissions, ACLs, and timestamps...

You could try the filesync tool with the "-n" option, which will makeit just find the differences and not attempt to make changes. If youback up /foo and restore it into /restore/foo, then the filesync commandwould be something like this:

filesync -n -ame -fsrc -s / -d /restore foo

The "-n" means not to make any changes, the "-aem" means to checkACLs and modification times and flag everything found (even if itcan't be changed, the "-fsrc" means to consider the source directoryto be the authoritative one, "-s" specifies the directory thatCONTAINS the source thingy to be synchronized, "-d" specifies thedirectory that contains the destination thing to be synced, and"foo" is the thing to be synced.

If you wanted to compare all of "/" against something containedin "/" (such as "/restore"), you could type this in ksh or bash:

Page 20: Solaris Real Stuff

cd /filesync -n -ame -fsrc -s / -d /restore ./

Then when the cursor is at the end of the line, do ESC then "*" invi mode or Meta-"*" in emacs mode, and it will expand the list offiles, at which point you can delete "restore" from the list. (Ifyou don't delete "restore" from the list, it will think everythingin "restore" should be in "restore/restore", which make will causethe output to be filled with extraneous stuff.)

- Logan

>I will restore the file system to a separate disk than the original,

The best way is to use "star" to compare both filesystemsas it is the only known program that is able to compare _all_file properties and meta-data (except for Extended attribute files).

As I currently know nobody who uses Extended attribute files, I amsure that this will fit your needs.

Call:

star -c -diff -vv -dump -acl -sparse diffopts=!atime,ctime,lmtime -Cfromdir . todir

BTW: This is also the fastest known method and if you like to copya filesystem, a similar method will copy the fs very fast.

Also have a look at star when doing incremental dumps.It might be more interesting for you than ufsdump/ufsrstore.

ftp://ftp.berlios.de/pub/star/alpha/

posted by Brahma at 2:12 PM 0 comments

How to free virtual memory

Re: How to free virtual memory

Looks like some process is leaking memory continously. The processneeds to fix it, use some open source tool like valgrind to find outwhich process is leaking memory

Page 21: Solaris Real Stuff

Regards,ChinmoyFor free software books visit http://geocities.com/freesoftwarebooks

we deployed a Seebeyond project in Unix Machine in thati have problem of excceding the virtual memory,the virtual memory iskeep on incresing ,at some point it is strucking up the unix server,

is there any way to free the virtual memomry faster,and can we stop the incresing of virtual memory

i need some urgently

RegardsRambabu.Y

posted by Brahma at 2:11 PM 0 comments

Tuning the I/O Subsystem

Tuning the I/O Subsystem

I/O is probably one of the most common problems facing Oracle users.In many cases, the performance of the system is entirely limited bydisk I/O. In some cases, the system actually becomes idle waiting fordisk requests to complete. We say that these systems are I/O bound ordisk bound.

As you see in Chapter 14, "Advanced Disk I/O Concepts," disks havecertain inherent limitations that cannot be overcome. Therefore, theway to deal with disk I/O issues is to understand the limitations ofthe disks and design your system with these limitations in mind.Knowing the performance characteristics of your disks can help you inthe design stage.

Optimizing your system for I/O should happen during the design stage.As you see in Part III, "Configuring the System," different types ofsystems have different I/O patterns and require different I/O designs.Once the system is built, you should first tune for memory and thentune for disk I/O. The reason you tune in this order is to make surethat you are not dealing with excessive cache misses, which causeadditional I/Os.

The strategy for tuning disk I/O is to keep all drives within theirphysical limits. Doing so reduces queuing time—and thus increasesperformance. In your system, you may find that some disks process many

Page 22: Solaris Real Stuff

more I/Os per second than other disks. These disks are called "hotspots." Try to reduce hot spots whenever possible. Hot spots occurwhenever there is a lot of contention on a single disk or set ofdisks.Understanding Disk Contention

Disk contention occurs whenever the physical limitations of a diskdrive are reached and other processes have to wait. Disk drives aremechanical and have a physical limitation on both disk seeks persecond and throughput. If you exceed these limitations, you have nochoice but to wait.

You can find out if you are exceeding these limits both throughOracle's file I/O statistics and through operating system statistics.This chapter looks at the Oracle statistics; Chapter 12, "OperatingSystem-Specific Tuning," looks at the operating system statistics forsome popular systems.

Although the Oracle statistics give you an accurate picture of howmany I/Os have taken place for a particular data file, they may notaccurately represent the entire disk because other activity outside ofOracle may be incurring disk I/Os. Remember that you must correlatethe Oracle data file to the physical disk on which it resides.

posted by Brahma at 2:10 PM 0 comments

debugging RC scripts. solaris9

debugging RC scripts. solaris9

I can remember how to debug the startup scripts. Can someone help meout here. I just want Solaris to report what startup script it iscurrently executing. I thought it was as simple as adding a "+" to the/etc/rc* script but that didnt work.

There's no really simple way to do this. You may be thinking of addingset -x to /etc/rc?, but that gets overly verbose for me.

Often I've made a small edit to /etc/rc?. There's a startup loop inthere where it runs /bin/sh $f start or so for each of the scripts.Just add a "echo starting $f" and a "echo done starting $" inside the"if" and outside the "case" statements. (and make a backup!). Then youcan tell what it's trying to do and where it hangs.

Once there, you can make that one script more verbose.

Page 23: Solaris Real Stuff

If you were running Solaris 10, you could use 'boot -m verbose'.

-

posted by Brahma at 2:10 PM 0 comments

statvfs / df bug?

statvfs / df bug?

Hi all,I am trying to get the filesystem information using statvfs/df. I havean automounted partition mounted on /mntauto. I am running SunOS 5.8

If I say 'df -k /mntauto', I get the following output:--------bash-2.03# df -k /mntautoFilesystem kbytes used avail capacity Mounted oncpsupsun1:/mntauto 482455 10 434200 1% /mntauto--------

I have written two different programs using 'statvfs' to print thefilesystem information. Following is the putput fron these twodifferent programs:

Program 1:main() {struct statvfs info;

access("/mntauto",F_OK);

if (-1 == statvfs("/", &info))perror("statvfs() error");else {puts("statvfs() returned the following information");puts("about the ('/mntauto') file system:");printf(" f_bsize : %u\n", info.f_bsize);printf(" f_blocks : %u\n", info.f_blocks);printf(" f_bfree : %u\n", info.f_bfree );printf(" f_bavail : %u\n", info.f_bavail);printf(" f_files : %u\n", info.f_files);printf(" f_ffree : %u\n", info.f_ffree);printf(" f_fsid : %u\n", info.f_fsid);printf(" f_flag : %X\n", info.f_flag);printf(" f_namemax : %u\n", info.f_namemax);

Page 24: Solaris Real Stuff

printf(" f_basetype : %s\n", info.f_basetype);printf(" f_fstr : %s\n", info.f_fstr);

}

}

Output:statvfs() returned the following informationabout the ('/mntauto') file system:f_bsize : 8192f_blocks : 4129290f_bfree : 3026760f_bavail : 2985468f_files : 512512f_ffree : 507103f_fsid : 8388608f_flag : 4f_namemax : 255f_basetype : ufsf_fstr :

Program 2:

main() {struct statvfs info;struct stat sb;

if (stat("/mntauto/log", &sb) < 0)printf("stat failed\n");

if (S_ISDIR(sb.st_mode))printf("Dir\n");elseprintf("Not a dir\n");

if (-1 == statvfs("/mntauto", &info))perror("statvfs() error");else {puts("statvfs() returned the following information");puts("about the ('/mntauto') file system:");printf(" f_bsize : %u\n", info.f_bsize);printf(" f_blocks : %u\n", info.f_blocks);printf(" f_bfree : %u\n", info.f_bfree );printf(" f_bfree : %u\n", info.f_bavail);printf(" f_files : %u\n", info.f_files);

Page 25: Solaris Real Stuff

printf(" f_ffree : %u\n", info.f_ffree);printf(" f_fsid : %u\n", info.f_fsid);printf(" f_flag : %X\n", info.f_flag);printf(" f_namemax : %u\n", info.f_namemax);printf(" f_basetype : %s\n", info.f_basetype);printf(" f_fstr : %s\n", info.f_fstr);}

}

Output:Dirstatvfs() returned the following informationabout the ('/mntauto') file system:f_bsize : 8192f_blocks : 964910f_bfree : 964890f_bfree : 868400f_files : 247296f_ffree : 247290f_fsid : 80740364f_flag : 0f_namemax : 4294967295f_basetype : nfsf_fstr :

Could anyone please explain me the differences between the aboveoutputs? I assume all of the above should print the same answer...Isthis a know bug of statvfs?

Thanks in advance,Tushar.

Reply

My apologies.I got the error!It was a typo...

I would say everything is working okay.Program 1 does a statvfs of "/" and program 2 calls statvfs for "/mntauto".

posted by Brahma at 2:09 PM 0 comments

use telnet command in a shell script

Page 26: Solaris Real Stuff

Re: use telnet command in a shell script

I would use ssh instead of telnet setup a connection with keys so thatthe connection does not require a password (man ssh to find out how)then call it as follows

ssh hostname -l username "command" | tee -a output.txt

posted by Brahma at 2:07 PM 0 comments

fsck

> Our system is backed up to tape each night. Unfortunately our Sys Eng> is on holidays and I do not know how to recover from tape.

> Could you walk me through it please.

There are lots of ways to do it. Without knowing which your systemadmin chose, it's really hard to give you any useful information.

The most likely thing is that you need to use something like"mt -f /dev/rmt/0n asf 2" to move to file #2 on the tape (thenumber 2 is just a random number picked; you'll have to determinewhere the backup of the filesystem you need is located on thetape and use that number instead). Then you'd change to somedirectory (like /tmp or some place with lots of space) and doa "ufsrestore ivf /dev/rmt/0n". Then use "cd", "ls", and "pwd"to navigate, "add" and "delete" to select which files to extract,and "extract" (whose prompt you should answer with "1") to extractthem from the tape. Oh, and then "mt -f /dev/rmt/0n offline" torewind and eject the tape.

Of course, this assumes that the administrator chose to use ufsdumpto back up the files, which is definitely not a given. Also, it isquite possible that the administrator chose to do incrementalbackups, so if that is the case, you may need to restore from a fullbackup tape *and* an incremental backup tapem, which makes thingseven more complicated. It's really hard to know what the rightthing to do is without knowing what backup scheme the administratorchose for that system.

posted by Brahma at 2:07 PM 0 comments

if I do a metadb -a /dev/dsk/c0t0d0s4

Page 27: Solaris Real Stuff

i.e. if I do a metadb -a /dev/dsk/c0t0d0s4 will it just add another> database replica into the slice?

No. You'll have to delete all the replicas in one slice, then createall the replicas at one time.

metadb -d /dev/dsk/c0t0d0s4metadb -a -c 2 /dev/dsk/c0t0d0s4

posted by Brahma at 2:06 PM 0 comments

Tape Control -the mt Command:

Tape Control -the mt Command:

This assume that the device is at the 0 address.

Shows whether device is valid, whether tape is loaded, and status of tape

mt -f /dev/rmt/0 status:

Rewinds tape to start

mt -f /dev/rmt/0 rewind:

Shows table of contents of archive. If tar tvf produces an error, thenthere are no more records on the tape.

tar tvf /dev/rmt/0:

Advanced to the next archive on the tape.

mt -f /dev/rmt/0 fsf:

Moves the tape to the end of the last archive that it can detect.

mt -f /dev/rmt/0 eom:

Erases the tape. Use with care.

mt -f /dev/rmt/0 erase:

Ejects the tape, if the device supports that option.

mt -f /dev/rmt/0 offline:

Page 28: Solaris Real Stuff

To extract lengthy archives even if you plan to log out, use the nohupcommand as follows:

nohup tar xvf /dev/rmt/0 &

Identify the tape device

dmesg | grep st

Check the status of the tape drive

mt -f /dev/rmt/0 status

Tarring files to a tape

tar cvf /dev/rmt/0 *

Cpioing files to a tape

find . -print | cpio -ovcB > /dev/rmt/0

Viewing cpio files on a tape

cpio -ivtB < /dev/rmt/0

Restoring a cpio

cpio -ivcB < /dev/rmt/0

To compress a file

compress -v some.file

To uncompress a file

uncompress some.file.Z

To encode a file

uuencode some.file.Z some.file.Z

To unencode a file

uudecode some.file.Z some.file.Z

To dump a disk slice using ufsdump

Page 29: Solaris Real Stuff

ufsdump 0cvf /dev/rmt/0 /dev/rdsk/c0t0d0s0orufsdump 0cvf /dev/rmt/0 /export/home

To restore a dump with ufsrestore

ufsrestore rvf /dev/rmt/0

To duplicate a disk slice directly

ufsdump 0f - /dev/rdsk/c0t0d0s7 |(cd /home;ufsrestore xf -)

posted by Brahma at 2:05 PM 1 comments

Mirror Removal

How To: Mirror Removal

To remove a mirror from a volume (i.e., to remove one of the plexesthat belongs to the volume), run the following command:

vxplex -o rm dis

Any associated subdisks will then become available for other uses. Toremove the disk from Volume Manager control entirely, run thefollowing command:

vxdisk rm

For example, "vxdisk rm c1t1d0s2".

How To: Mirror Backup

The following techniques can be used to backup mirrored volumes bytemporarily taking one of the mirrors offline and then reattaching themirror to the volume once the backup has been run.

1. Disassociate one of the mirrors from the volume to be backed up:

vxplex dis

2. Create a new, temporary volume using the disassociated plex:

vxmake -g -U gen vol tempvol plex=

3. Start the new volume:

Page 30: Solaris Real Stuff

vxvol start tempvol

4. Clean the new volume before mounting:

fsck -y /dev/vx/rdsk//tempvol

5. Mount the new volume and perform the backup

6. Unmount the new volume

7. Stop the new volume:

vxvol stop tempvol

8. Disassociate the plex from the new volume:

vxplex dis

9. Reattach the plex to the original volume:

vxplex att

10. Delete the temporary volume:

vxedit rm tempvol

To display the current Veritas configuration, use the following command:

vxprint

To monitor the progress of tasks, use the following command:

vxtask -l list

To display information related to plexes, run the following command:

vxprint -lp

posted by Brahma at 2:04 PM 0 comments

Fixing Corrupted Files and wtmpx Errors

Fixing Corrupted Files and wtmpx Errors

Unfortunately, system accounting is not foolproof. Occasionally, afile becomes corrupted or lost. Some of the files can simply be

Page 31: Solaris Real Stuff

ignored or restored from backup. However, certain files must be fixedto maintain the integrity of system accounting.

The wtmpx files seem to cause the most problems in the daily operationof the system accounting. When the date is changed manually and thesystem is in multiuser mode, a set of date change records is writteninto the /var/adm/wtmpx file. The wtmpfix utility is designed toadjust the time stamps in the wtmp records when a date change isencountered. However, some combinations of date changes and rebootsslip through the wtmpfix utility and cause the acctcon program tofail.How to Fix a Corrupted wtmpx File

1.

Become superuser.2.

Change to the /var/adm directory.3.

Convert the wtmpx file from binary to ASCII format.

# /usr/lib/acct/fwtmp < wtmpx > wtmpx.ascii

4.

Edit wtmpx.ascii to delete the corrupted records.5.

Convert the wtmpx.ascii file back to a binary file.

# /usr/lib/acct/fwtmp -ic < wtmpx.ascii > wtmpx

See fwtmp(1M) for more information.

posted by Brahma at 2:04 PM 0 comments

Is there a way to determine the PID associated with a socket ?

lsof

or

> Is there a way to determine the PID associated with a socket ?

Page 32: Solaris Real Stuff

Or using native commands without using lsof is to use pfiles.

cd /procpfiles * > /tmp/pfiles.out

search through pfiles.out for the process that has the socket open youare interested in. i.e. there will be entries such as:

3771: /export/home/archiver/bin/myprocessCurrent rlimit: 256 file descriptors0: S_IFCHR mode:0666 dev:85,0 ino:191320 uid:0 gid:3 rdev:13,2O_RDONLY|O_LARGEFILE1: S_IFCHR mode:0666 dev:85,0 ino:191397 uid:0 gid:0 rdev:24,2O_RDWR|O_LARGEFILE2: S_IFREG mode:0644 dev:85,5 ino:17 uid:104 gid:1 size:139436O_WRONLY|O_LARGEFILE3: S_IFDOOR mode:0444 dev:293,0 ino:58 uid:0 gid:0 size:0O_RDONLY|O_LARGEFILE FD_CLOEXEC door to nscd[200]4: S_IFSOCK mode:0666 dev:287,0 ino:19574 uid:0 gid:0 size:0O_RDWRsockname: AF_INET 10.1.1.1 port: 9001peername: AF_INET 10.1.1.2 port: 9001

pid 3771 has port 9001 open locally.

posted by Brahma at 2:04 PM 1 comments

snapshot error: File system could not be write locked

Thomas wrote:> Is there anyway to fssnap the root file system? I would like to use a> snapshot for backup, but when I try to do that I get the error:> snapshot error: File system could not be write locked

Are you running ntp/xntp? By default that program runs in the realtimeprocessing class, and has a current directory in the root filesystem.You can't write lock the filesystem while that's true.

>> Yes, we are running ntp. I will try killing that and see what happens.>> I am thinking that it might be better to repartition the disk rather than>> to take ntp down and up for every backup.> Why? Simply create a script that stops xntpd, creates a snapshot, starts> xntpd and perform backup. No need to repartition.

But that's not kind to ntp, which wants to keep running to remainstable.

Page 33: Solaris Real Stuff

The only problem here is that it's working directory is in root. I seetwo possible workarounds.

#1 Have it run in a non-RT class. You can use priocntl for that. Idon't think it'll have a dramatic effect on the timekeeping, but youmight not want to do this if you need very accurate time on thismachine.

$ pgrep ntp302$ /usr/bin/ps -o class,pid -p 302CLS PIDRT 302$ priocntl -s -c TS 302$ /usr/bin/ps -o class,pid -p 302CLS PIDTS 302

Thus taking it from the realtime class to the timesharing class. (Isuppose I should have tried a fssnap at that point, but didn't...)

#2 Run it with the working directory not in root. I don't see anyreason it couldn't run in /tmp or /var/run, unless you wanted toretain any core files that might be generated. I'm not certain howbest to achieve that, but I saw a post that suggested someone hadgood luck using chroot.

posted by Brahma at 2:03 PM 0 comments

Friday, October 14, 2005

Problems with port forwarding using SSH[R]

Port forwarding feature using SSH is failing.Resolution: Top

Confirm these configuration settings in /etc/ssh/sshd_config file:

AllowTcpForwarding yesGatewayPorts yes

Then execute:

# ssh -g -L 8080:webserver:80 webserver

Page 34: Solaris Real Stuff

In this example, systems connecting to http://webserver:8080 will beforwarded to the web server daemon httpd listening on TCP port 80 onthe host webserver.

If you allow root access with this setting in /etc/ssh/sshd_config:

PermitRootLogin yes

You can use privileged or reserved ports (range 1-1023) with the above command.Temporary Workaround: Top

Additional Information: Top

posted by Brahma at 3:56 PM 0 comments

SSH Frequently Asked Questions Keys

1.1. General troubleshooting hints

*

In order for us to help you, the problem has to be repeatable.*

When reporting problems, always send the output of

$ ssh -v -l <user> <destination>

*

If you have a root account on the destination host, please run

# /usr/sue/etc/sshd -d -p 222

or (on Linux)

# /usr/sbin/sshd -d -p 222

as root there and connect using

$ ssh -v -p 222 <user> <destination>

(the sshd server will exit after each connection, and you mayrun into trouble with a local firewall that prevents you fromconnecting from a different machine. Same-machine connections to

Page 35: Solaris Real Stuff

"localhost" should work)*

If you do not have root access on the server, you can generateyour own "server" key pair and run on an unprivileged port:

$ ssh-keygen -P "" -f /tmp/sshtest$ pagsh -c "/usr/sue/etc/sshd -d -p 2222 -h /tmp/sshtest"

or (on Linux)

$ pagsh -c "/usr/sbin/sshd -d -p 2222 -h /tmp/sshtest"

Then connect using

$ ssh -v -p 2222 <user> <destination>

1.5. log in using RSA keys

Do you really want to do this? Using RSA for login means you will notget an AFS token, so you cannot access most of your home directory onthe public servers. There is no way to "translate" between RSA key andAFS tokens.

If you want to give it a try, check the following common errors:

*

the UNIX permissions must be correct: 0600 for~/.ssh/authorized_keys, 0755 for ~/.ssh (and AFS read access foreverybody!), home directory not writable by anybody but you.

Warning

Please make sure that your private key is somewhere safe (e.g.in ~/private, with a symlink to ~.ssh), and encrypted using a goodpass phrase.*

in ~/.ssh/authorized_keys, there has to be one key per line (nolinebreaks allowed)

The debugging tips at the beginning of this chapter (running theserver in debug mode) should point out the reason for failure prettyquickly.

Page 36: Solaris Real Stuff

1.6. New warning messages

OpenSSH stores both the host name and the IP number together with thehost key. This leads to some new messages:

Warning: Permanently added 'lxplus001,137.138.161.126' (RSA) tothe list of known hosts.Warning: Permanently added the RSA host key for IP address'137.138.161.126' to the list of known hosts.

If these annoy you, use "CheckHostIP no" in your $HOME/.ssh/configfile. However, please be aware that you are turning off an intentionalsecurity feature of ssh.

Some warning that may appear while connecting to the PLUS serversunder their common DNS name (e.g. RSPLUS, HPPLUS) is due to the factthat for load-balancing purposes, these servers' DNS entry isconstantly changing. This is detected and reported by ssh (as itshould be).

@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ WARNING: POSSIBLE DNS SPOOFING DETECTED! @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@The RSA host key for rsplus has changed,5 and the key for the according IP address 137.138.246.82is unknown. This could either mean thatDNS SPOOFING is happening or the IP address for the hostand its host key have changed at the same time@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@10 @ WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED! @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!Someone could be eavesdropping on you right now (man-in-the-middle attack)!It is also possible that the RSA host key has just been changed.15 Please contact your system administrator.Add correct host key in /afs/cern.ch/user/i/iven/.ssh/known_hoststo get rid of this message.Offending key in /afs/cern.ch/user/i/iven/.ssh/known_hosts:246Password authentication is disabled to avoid Trojan horses.Agent forwarding is disabled to avoid Trojan horses.

Page 37: Solaris Real Stuff

To avoid these, use qualified hostnames like rsplus01, hpplus01 etc..(LXPLUS and SUNDEV are not prone to this problem, since a common hostkey is used on all the servers in the cluster)

An alternative is to (manually) insert into $HOME/.ssh/known_hosts thePLUS name after each qualified machine name that belongs to this PLUSservice:

rsplus01,rsplus 1024 37 15457042575...rsplus02,rsplus 1024 37 10734479336...

To remove the above error message, simply edit the file~/.ssh/known_hosts (or ~/.ssh/known_hosts2 for the SSH-2 protocol) andremove the line (which should start with the hostname and/or IPaddress). Be careful not to break the long lines, it has to have oneline per host/key. Next time you connect, ssh should ask you whetheryou actually want to connect, etc..1.7. Statistics options for scp

OpenSSH scp does not support a few of the command line options fromssh-1.2.26. Besides, the statistics output is different. Theenvironment variables controlling statistics output (SSH_SCP_STATS,SSH_NO_SCP_STATS, SSH_ALL_SCP_STATS, SSH_NO_ALL_SCP_STATS) are notsupported, either. The changed options are

ssh-1.2.26 option meaning OpenSSH option-a Turn on statistics display for each file (on by default) (on by default)-A Turn off statistics display for each file. This appears to be ano-op for ssh-1.2.26 (n.a., use -q to turn off all statistics)-L Use non privileged port -o UsePriviledgedPort=no (works as well onssh-1.2.26)-Q Turn on statistics display (on by default)

Sample statistics output from OpenSSH scp (no explicit options)

junk 100% |*****************************| 22867 00:00zeroes 100% |*****************************| 512 KB 00:00

and output from ssh-1.2.26 scp:

junk | 22 KB | 22.3 kB/s | ETA:00:00:00 | 100%zeroes | 512 KB | 512.0 kB/s | ETA:00:00:00 | 100%

Page 38: Solaris Real Stuff

If you actually parse this output in scripts, you would have to change them.1.8. Errors on exit regarding X11 applications

Since the ssh client does forwarding for the X11 traffic from theremote host, it won't exit until the last X11 application has beenclosed. It appears that this mechanism sometimes fails, and the sshprogram will report errors like below even if all remote X11applications are done:

(logout)Waiting for forwarded connections to terminate...The following connections are open:X11 connection from xxxxx.cern.ch port 2352

The session will appear to hang. It can be closed by typing "~."(without the quotes), and this should return you to your previousshell. You could use "~&" as well to leave the current connection as abackground process.

If you are sure that there are no X11 windows or icons from the remoteserver around, and if you can reproduce the problem, please [email protected].

A current suspicion is that the regular network scanning mechanismplays a role in this: by opening a connection to the remote X11 port,but failing to connect through the forwarded channel, this could messup the internal bookkeeping done by ssh. To be confirmed.

posted by Brahma at 3:56 PM 0 comments

Problem replacing disk in StorEdge T3

Problem replacing disk in StorEdge T3

At work we have a T3 where all disks are configured for RAID5. One ofthe disks has failed, which means that accessing the data on the T3 isreally slow.

When I entered the replacement disk, it seemed to be taken in useautomatically (proc list showed some progress), but then it failed witha 0D status (see vol stat, fru stat etc below).

I noticed that the disk is not exactly the same as the other, could thisbe the reason? It is a proper replacement disk bought from Sun with theproper bracket and everything, so it should work or what? What can I doto fix this?

Page 39: Solaris Real Stuff

--- Erlend Leganger

T300 Release 1.17b 2001/05/31 17:47:22Copyright (C) 1997-2001 Sun Microsystems, Inc.All Rights Reserved.

bigdaddy:/:<1>vol stat

v0 u1d1 u1d2 u1d3 u1d4 u1d5 u1d6 u1d7 u1d8u1d9mounted 0D 0 0 0 0 0 0 0 0

bigdaddy:/:<2>fru listID TYPE VENDOR MODEL REVISION SERIAL------ ----------------- ----------- ----------- -------- --------u1ctr controller card SLR-MI 375-0084-02- 0210 022813u1d1 disk drive SEAGATE ST336605FSUN A338 3FP0H63Du1d2 disk drive SEAGATE ST336704FSUN A42D 3CD0VFBLu1d3 disk drive SEAGATE ST336704FSUN A42D 3CD0T89Wu1d4 disk drive SEAGATE ST336704FSUN A42D 3CD0VCZ4u1d5 disk drive SEAGATE ST336704FSUN A42D 3CD0VF5Lu1d6 disk drive SEAGATE ST336704FSUN A42D 3CD0TG33u1d7 disk drive SEAGATE ST336704FSUN A42D 3CD0TT8Gu1d8 disk drive SEAGATE ST336704FSUN A42D 3CD0VD4Tu1d9 disk drive SEAGATE ST336704FSUN A42D 3CD0TXQFu1l1 loop card SLR-MI 375-0085-01- 5.02 Flash 033179u1l2 loop card SLR-MI 375-0085-01- 5.02 Flash 030038u1pcu1 power/cooling unit TECTROL-CAN 300-1454-01( 0000 028800u1pcu2 power/cooling unit TECTROL-CAN 300-1454-01( 0000 028799u1mpn mid plane SLR-MI 370-3990-01- 0000 021282bigdaddy:/:<3>fru statCTLR STATUS STATE ROLE PARTNER TEMP------ ------- ---------- ---------- ------- ----u1ctr ready enabled master - 30.5

DISK STATUS STATE ROLE PORT1 PORT2 TEMP VOLUME------ ------- ---------- ---------- --------- --------- ---- ------u1d1 ready disabled data disk ready ready 30 v0u1d2 ready enabled data disk ready ready 33 v0u1d3 ready enabled data disk ready ready 34 v0u1d4 ready enabled data disk ready ready 32 v0u1d5 ready enabled data disk ready ready 33 v0u1d6 ready enabled data disk ready ready 33 v0u1d7 ready enabled data disk ready ready 36 v0

Page 40: Solaris Real Stuff

u1d8 ready enabled data disk ready ready 32 v0u1d9 ready enabled data disk ready ready 32 v0

LOOP STATUS STATE MODE CABLE1 CABLE2 TEMP------ ------- ---------- ------- --------- --------- ----u1l1 ready enabled master - - 27.0u1l2 ready enabled slave - - 27.5

POWER STATUS STATE SOURCE OUTPUT BATTERY TEMP FAN1 FAN2------ ------- --------- ------ ------ ------- ------ ------ ------u1pcu1 ready enabled line normal fault normal normalnormalu1pcu2 ready enabled line normal fault normal normalnormalbigdaddy:/:<4>exitConnection closed by foreign host.

Reply

> to fix this?

I'd say, complain @ Sun.Searching google, I found the documentation from Seagate. Among otherthings, it lists this:ST336605: 29,549 cyl / 4 heads / 71,687,371 data blocksST336704: 14,100 cyl / 12 heads / 71,687,369 data blocksI don't know whether these differences are a problem in this case. Sunshould be able to tell...Maybe the issue can be fixed with a firmware update on the new drive (oron all the old ones)?

You need to take a look at the syslog file right after the rebuildfails. There should be more information in there. I have had thishappen before where the rebuild fails because of a read error onanother disk...

Reply

1) Your boot firmware is very old.2) Your disk firmware is way out of date.3) Both the batteries in your PCUs are expired.

The latest boot firmware is 1.18.04 and you're at 1.17b.That's at least 3 years out-of-date!

Page 41: Solaris Real Stuff

The latest disk firmware for the ST336605FSUN is A838The latest disk firmware for the ST336704FSUN is AE26

If you're lucky you'll be able to recover. The 'proc list' command will showif the new disk is being reconstructed to. Otherwise hopefully you have away to backup the data. If so you can get the batteries replaced, upgradeall the firmware and reinitialize the volume and restore the data.

Reply

> another disk...

Thanks for the tip. I have now learnt that the disk should be OK, so Iwill try this again tomorrow and watch the syslog as you suggest. I willbe back with the result.

--- Erlend Leganger

Reply

> all the firmware and reinitialize the volume and restore the data.

I guess this is what happens when you have a device that works OK, youjust forget about it... The batteries have been replaced though, we hadordered them in.

I was able to copy the data from the T3 to other disk areas on theserver, so I'm OK with the files (I also have a backup on tape madebefore it failed). I haven't RTFM yet, but are there any tips I shouldbe aware of when upgrading boot and disk firmware? What to do first?Where do I get hold of the firmware updates?

> ordered them in.

You have to do more than just replace the batteries or the T3 won't knowanything has changed. Commands need to be ran to reset the dates back tozero so the errors will go away.

This InfoDoc should explain the procedures:

http://www.sunshack.org/data/sh/2.1/infoserver.central/data/syshbk/co...

Also the batteries should now last 3 years instead of 2 years per Sun.

In the same patch you would use to upgrade the boot and disk firmware:

Page 42: Solaris Real Stuff

http://sunsolve.sun.com/pub-cgi/pdownload.pl?target=109115-17&method=h

there is a T3extender program that will run commands to set the batteryexpiration life to 36 months instead of 24 months.

> another disk...

You were 100% correct. The warning light was lit on disk u1d1, so thisdisk was replaced and attempted rebuilt. The rebuild failed after awhile, with a note of multiple disk errors in the syslog - it seems asu1d4 has a problem as well. I was fooled by vol stat only showing erroron on u1d1 - I will check the syslog more carefully in the future.

--- Erlend Leganger

> http://sunsolve.sun.com/pub-cgi/pdownload.pl?target=109115-17&method=h

Excellent, thank you. I need to wait for my second replacement disk, butafter reading up on the patch installation method, it doesn't seem toodifficult to do.

> there is a T3extender program that will run commands to set the battery> expiration life to 36 months instead of 24 months.

I had a look at the T3extender program code and I decided that usingthis patch is an extreme overkill (creating a long perl script and eveninclude perl itself in the patch) to do a small job: I only made two".id write blife <pcu> 36" commands which seems to do the trick (seebelow). Of course, if you have a room full of racks fully populated withT3s, the script would be handy...

--- Erlend Leganger

bigdaddy:/:<48>bigdaddy:/:<48>id read u1pcu1Revision : 0000Manufacture Week : 00442000Battery Install Week : 00412005Battery Life Used : 0 days, 2 hoursBattery Life Span : 730 days, 12 hoursSerial Number : 028800Battery Warranty Date: 20051010082149Battery Internal Flag: 0x00000000Vendor ID : TECTROL-CAN

Page 43: Solaris Real Stuff

Model ID : 300-1454-01(50)bigdaddy:/:<49>bigdaddy:/:<49>id read u1pcu2Revision : 0000Manufacture Week : 00442000Battery Install Week : 00412005Battery Life Used : 0 days, 2 hoursBattery Life Span : 730 days, 12 hoursSerial Number : 028799Battery Warranty Date: 20051010082152Battery Internal Flag: 0x00000000Vendor ID : TECTROL-CANModel ID : 300-1454-01(50)bigdaddy:/:<50>bigdaddy:/:<50>bigdaddy:/:<50>.id write blife u1pcu1 36bigdaddy:/:<51>.id write blife u1pcu2 36bigdaddy:/:<52>bigdaddy:/:<52>bigdaddy:/:<52>id read u1pcu1Revision : 0000Manufacture Week : 00442000Battery Install Week : 00412005Battery Life Used : 0 days, 2 hoursBattery Life Span : 1095 days, 18 hoursSerial Number : 028800Battery Warranty Date: 20051010082149Battery Internal Flag: 0x00000000Vendor ID : TECTROL-CANModel ID : 300-1454-01(50)bigdaddy:/:<53>bigdaddy:/:<53>bigdaddy:/:<53>id read u1pcu2Revision : 0000Manufacture Week : 00442000Battery Install Week : 00412005Battery Life Used : 0 days, 2 hoursBattery Life Span : 1095 days, 18 hoursSerial Number : 028799Battery Warranty Date: 20051010082152Battery Internal Flag: 0x00000000Vendor ID : TECTROL-CANModel ID : 300-1454-01(50)bigdaddy:/:<54>bigdaddy:/:<54>bigdaddy:/:<54>

Page 44: Solaris Real Stuff

posted by Brahma at 3:55 PM 0 comments

Secure remote tasks with ssh and keys

Secure remote tasks with ssh and keys

Takeaway:If you want to set up another administrator on your server or executeremote tasks securely, learn to use ssh with keys. Vincent Danen tellsyou how in this Linux tip.

Often, if you're administering a server, you'll find you need toexecute some small task on the server, or you want to delegate a taskto another administrator, but you don't want to give them full access.Perhaps you want to execute a remote backup or status test. This canall be accomplished using ssh with keys so that it can be unattended,but still secure.

The first step is to create the ssh key using the ssh-keygen utility.This is extremely straightforward. If you plan to have the taskunattended, be sure to not give it a password. To increase security,make a special account to execute the task; make sure it can't log in,and make sure that the ssh public key is used only on a particularserver or set of servers.

On the remote server, copy the user's ssh public key into~/.ssh/authorized_keys. You will need to make some modifications tothe line in authorized_keys. To begin, you should set a "command"keyword to ensure that only one particular command can be executed bythat key. The syntax looks like:

<code>

command="" KEY

</code>

where command could be something as simple as "/usr/bin/rsync" or"/usr/local/bin/foo.sh". To enhance and secure this further, add thefollowing options to authorized_keys:

<code>

command="/usr/local/bin/foo.sh",no-port-forwarding,no-X11-forwarding,no-agent-forwarding,no-pty KEY

Page 45: Solaris Real Stuff

</code>

This ensures that anyone connecting cannot do any port forwarding, X11forwarding, agent forwarding, and ssh doesn't allocate a pseudo-TTYwhich prevents the issuing of commands through an interactive session.

If the client system is adequately secured to protect thepassword-less key, and the availability of commands is restricted onthe server, using SSH to execute remote commands is a breeze.

posted by Brahma at 3:54 PM 0 comments

plumb and unplumb

> Basically it seesm its unplumbed but still existing in the running system> so commands like> arp -a> netstat -i> should show it,

No. Unplumbed devices do *NOT* appear in either of those two lists.Unplumbed devices are simply unknown to IP and ARP.

> If you dont want to reboot, try plumbing and unplumbing it, might do the> trick.

Very likely not.

Driver loading and operation is only indirectly related to plumbing."Plumb" means that IP opens the driver (triggering it to load intomemory if necessary) and begins using it.

"Unplumb" means only that IP closes the driver stream. If the driveritself is still in memory (and a driver that manages multipleinstances and has one instance still plumbed, as in the originalposter's stated configuration, is certainly in that state), then --depending on how the driver itself is designed -- may still befielding interrupts from the underlying hardware.

Plumbing and unplumbing IP will do nothing in that case.

posted by Brahma at 3:53 PM 0 comments

005 11:01 am Subject: directio

Page 46: Solaris Real Stuff

direction

I have a running process which does io:

last pid: 22838; load averages: 0.94, 0.91, 0.8217:54:29103 processes: 100 sleeping, 1 zombie, 2 on cpuCPU states: % idle, % user, % kernel, % iowait, %swapMemory: 4096M real, 256M free, 7188M swap in use, 3392M swap free

PID USERNAME LWP PRI NICE SIZE RES STATE TIME CPU COMMAND7335 root 2 20 0 1223M 1164M cpu/2 24.7H 49.53%clemserv_9_0

truss -p 7335

/1: read(15, "020301\0\0\0\0\0\0\0\0\0".., 8192) = 8192/1: lseek(15, 0x8F48CA00, SEEK_SET) = 0x8F48CA00/1: read(15, "\0\0\0\0\0\0\0\0\0\0\0\0".., 8192) = 8192/1: lseek(15, 0x8F48CB20, SEEK_SET) = 0x8F48CB20/1: read(15, "020301\0\0\0\0\0\0\0\0\0".., 8192) = 8192/1: lseek(15, 0x8F48CA00, SEEK_SET) = 0x8F48CA00/1: read(15, "\0\0\0\0\0\0\0\0\0\0\0\0".., 8192) = 8192/1: lseek(15, 0x8F48CB20, SEEK_SET) = 0x8F48CB20/1: read(15, "020301\0\0\0\0\0\0\0\0\0".., 8192) = 8192/1: lseek(15, 0x8F48CA00, SEEK_SET) = 0x8F48CA00/1: read(15, "\0\0\0\0\0\0\0\0\0\0\0\0".., 8192) = 8192

When i monitor the system , both for directIO and normal - cached io ,i see the following patterns.I will ve appreciated if someone cancomment in order to eplain this:

The program reads data from the /data filesystem. This is ufs - on emcdisk array . ( i have 1 hba - fibre channel)

mount -o remount,noforcedirectio /data

# sar 30 10000

SunOS verdenfs1 5.9 Generic_117171-12 sun4u 10/05/2005

17:53:42 %usr %sys %wio %idle17:54:12 14 37 0 4917:54:42 13 38 0 48

Page 47: Solaris Real Stuff

17:55:12 14 39 0 4717:55:42 13 38 0 49

extended device statisticsr/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device1.4 0.1 1206.1 0.4 0.0 0.1 0.1 44.8 0 7 c2t16d650.1 1.2 1.1 9.5 0.0 0.0 0.0 9.2 0 1 c1t1d00.0 1.0 0.0 7.8 0.0 0.0 15.2 35.4 0 0 c1t0d0extended device statisticsr/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device1.2 0.1 1183.2 0.4 0.0 0.1 0.1 48.7 0 6 c2t16d650.3 1.2 2.7 9.5 0.0 0.0 0.0 7.7 0 1 c1t1d00.0 2.4 0.0 19.3 0.1 0.1 29.6 26.9 0 1 c1t0d0extended device statisticsr/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device7.2 0.1 1227.7 0.4 0.0 0.1 0.0 11.1 0 8 c2t16d650.1 1.2 1.1 9.5 0.0 0.0 0.0 7.4 0 0 c1t1d00.0 0.4 0.0 2.3 0.0 0.0 0.0 11.1 0 0 c1t0d0extended device statisticsr/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device1.3 0.1 1264.0 0.4 0.0 0.1 0.1 44.9 0 6 c2t16d650.3 1.1 2.7 9.5 0.0 0.0 0.0 8.0 0 1 c1t1d00.0 0.2 0.0 1.6 0.0 0.0 0.0 11.4 0 0 c1t0d0

verdenfs1@root/tmp #vmstat -p 30memory page executable anonymousfilesystemswap free re mf fr de sr epi epo epf api apo apf fpifpo fpf7402800 1789416 73 89 10 0 1 1 0 0 0 1 1 262611 93474096 262832 158 0 0 0 0 0 0 0 0 0 0 12070 03474120 262520 187 249 0 0 0 0 0 0 0 0 0 12160 03474072 262304 166 33 1 0 0 0 0 0 0 0 0 11881 13474208 262144 159 0 0 0 0 0 0 0 0 0 0 12690 0

mount -o remount,forcedirectio /data

# sar 30 10000

SunOS verdenfs1 5.9 Generic_117171-12 sun4u 10/05/2005

Page 48: Solaris Real Stuff

%usr %sys %wio %idle17:57:12 8 24 23 4617:57:42 2 10 39 49

extended device statisticsr/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device257.8 0.0 62434.5 0.0 0.0 0.8 0.0 3.1 1 77 c2t16d650.1 1.2 1.1 9.4 0.0 0.0 0.0 8.3 0 1 c1t1d00.4 0.4 3.2 3.2 0.0 0.0 0.0 8.8 0 0 c1t0d0extended device statisticsr/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device244.4 0.0 64447.9 0.0 0.0 0.8 0.0 3.3 1 78 c2t16d650.3 1.2 2.7 9.2 0.0 0.0 0.0 7.7 0 1 c1t1d00.7 2.7 4.8 15.6 0.0 0.1 7.1 15.8 0 2 c1t0d0

verdenfs1@root/tmp #vmstat -p 30memory page executable anonymousfilesystemswap free re mf fr de sr epi epo epf api apo apf fpifpo fpf3474064 263456 31 15 0 0 0 0 0 0 2 0 0 619240 03474192 263440 25 8 0 0 0 0 0 0 1 0 0 639650 03474064 263088 43 88 0 0 0 5 0 0 0 0 0 639660 0

1.When i use noforcedirectio , sar reports no wio but 38 %sys , on theother hand iostat shows about 1MB. read per second with 44.9 msecservice time.This io utilized the disk only 6 or 8 percent. And ,vmstat -p shows that the system does pagins for file io only for 1MB.

2.But when i force directio , i have %39 wio , io rate growssignificantly (65MB per second) , the disks are utilized at 77 percentand service time is 3.1 msec. At the same time , i see 65MB. of fpi.

1. Why does the low io rate in 1 (noforcedirectio) create 44.9 msecservice time while the high io rate create only 3.1 msec? Isnt itlogical to expect to see more io more service time.

2. Since the option 2 uses forcedirectio , how can i explain the largefpi value? (If directio is in use , why does the operating system cachefile data?)

3. comparing the %38 sys with %39 wio , which one of them is better ?

Page 49: Solaris Real Stuff

Kind Regards,tolganI think the output is related with the IO's type of your application.

If IO is through "dd" command, the output is just what you thinkright, that is to say,1)using "noforcedirectio" option, IO is usingmore cpu power, and cached io(fpi) ,IO thoughput is higher;2)using"forcedirectio" option, opposite to the former."dd" command is using "read/write" system calls.

RnIs there any progress about it?

To confirm whether using the filecache or not, you can install thetool bundle "memtool" to assist you.http://www.solarisinternals.com/si/tools/memtool/index.phpAs a command from "memtool", "memps -m" can tell you which file is incache and how much cache the file occupy.

PS, another tool "directiostat" could be also helpful.http://www.solarisinternals.com/si/tools/directiostat/index.php

n>Is there any progress about it?

I dont understand, if cache is turned off, physial disk IO increases.This is natural, and the perfomance problems are to be found elsewhere.

/wfrnThe applications such as oracle, which has own file IO managementsystem, will be benefit from directio (file cache disabled);but theperformance of the normal file system IO will get an impact.

nThe applications such as oracle, which has own file IO managementsystem, will benefit from directio (file cache disabled);but theperformance of the normal file system IO will get an impact.

posted by Brahma at 3:53 PM 0 comments

graphics monitor and serial console on V440

Subject: Re: graphics monitor and serial console on V440n

Page 50: Solaris Real Stuff

> We have a V440 running Solaris 9. The system has an XVR-100 graphics> card and CRT monitor attached. Currently the CRT monitor is acting as> the console. Is there any way to continue to utilize the CRT monitor> for logging in and windowing (i.e. continue to see the dtgreet login> screen and login to CDE) while having another device attached to the> serial management port act as the console for the system?

Yes, that's pretty common.

Force the console to the device you want via 'input-device' and'output-device' in the eeprom.

Then cp /usr/dt/config/Xservers to /etc/dt/config (if you don't have onethere already). Read the examples at the top for the "if no characterdevice is associated" example and use that at the bottom instead of theexisting line. It'll probably look something like this when you'redone.

:0 Local local_uid@none root /usr/openwin/bin/Xsun :0 -nobanner

The monitor should come alive when dtlogin launches.

n

posted by Brahma at 3:52 PM 0 comments

Sun warns against putting raw data on s2

It should be noted though, that Sun warns against putting raw data on s2,since block zero contains the disk label, and labelling will overwrite thebeginning of your raw data.

Finally, I'll mention that I saw an interesting case many years ago, wherea disk with s0 consisting of the entire disk got unmounted rather abruptlyto say the least. fsck on s0 complained about a bad superblock, even withany of the alternate superblock locations. However, I was able to fsck s2with success, and then mount cleanly. Don't know enough about the guts ofthe disk layout to know why this worked, but it did.

On the other hand, changing the length of s2 to something other than theentire disk is an explicit no-no, according to Sun.

posted by Brahma at 3:51 PM 0 comments

Understanding Data Link Errors

Page 51: Solaris Real Stuff

Understanding Data Link ErrorsMany performance issues with NICs can be related to data link errors.Excessive errorsusually indicate a problem. When operating at half-duplex setting,some data link errorssuch as Frame Check Sequence (FCS), alignment, runts, and collisions are normal.Generally, a one percent ratio of errors to total traffic isacceptable for half-duplexconnections. If the ratio of errors to input packets is greater thantwo or three percent,performance degradation may be noticed.In half-duplex environments, it is possible for both the switch andthe connected deviceto sense the wire and transmit at exactly the same time and result ina collision.

Collisions can cause runts, FCS, and alignment errors due to the framenot being completely copied to the wire which results in fragmentedframes.When operating at full-duplex, FCS, Cyclic Redundancy Checks (CRC), alignmenterrors, and runt counters should be minimal. If the link is operatingat full-duplex, thecollision counter is not active. If the FCS, CRC, alignment, or runtcounters areincrementing, check for a duplex mismatch. Duplex mismatch is asituation where theswitch is operating at full-duplex and the connected device isoperating at half-duplex, orvice versa. The result of a duplex mismatch will be extremely slow performance,intermittent connectivity, and loss of connection. Other possiblecauses of data link errorsat full-duplex are bad cables, faulty switch port, or NICsoftware/hardware issues.

Explanation of Port Errors Counter Description

Troubleshooting NIC Compatibility Issues on ISUnet

Alignment ErrorsAlignment errors are a count of the number of frames received that don't endwith an even number of octets and have a bad CRC.

FCS(Frame Check Sequence)FCS error count is the number of frames that were transmitted/receivedwith a bad

Page 52: Solaris Real Stuff

checksum (CRC value) in the Ethernet frame. These frames are dropped and notpropagated onto other ports.

Xmit-ErrThis is an indication that the internal transmit buffer is full.

Rcv-ErrThis is an indication that the receive buffer is full.

UnderSizeThese are frames which are smaller than 64 bytes (including FCS) and have agood FCS value.

Single CollisionsSingle collisions are the number of times the transmitting port hadone collisionbefore successfully transmitting the frame to the media.

Multiple CollisionsMultiple collisions are the number of times the transmitting port had more thanone collision before successfully transmitting the frame to the media.

Late CollisionsA late collision occurs when two devices transmit at the same time andneither sideof the connection detects a collision. The reason for this occurrenceis because thetime to propagate the signal from one end of the network to another islonger thanthe time to put the entire packet on the network. The two devices that cause thelate collision never see that the other is sending until after it putsthe entirepacket on the network. Late collisions are detected by the transmitterafter the first"slot time" of 64 byte times. They are only detected during transmissions ofpackets longer than 64 bytes. Its detection is exactly the same as for anormal collision; it just happens late when compared to a normal collision.

ExcessiveExcessive collision are the number of Collisions frames that aredropped after 16 attemptsto send the packet resulting in 16 collisions.

Carrier SenseCarrier Sense occurs every time an Ethernet controller wants to send dataand the counter is incremented when there is an error in the process.

Page 53: Solaris Real Stuff

RuntsThese are frames smaller than 64 bytes with a bad FCS value.

GiantsThese are frames that are greater than 1518 bytes and have a bad FCS value.

Possible Causes for Incrementing Port ErrorsCounter Possible Cause

Alignment ErrorsThese are the result of collisions at half-duplex, duplex mismatch, bad hardware(NIC, cable or port), or connected device generating frames that donot end with onan octet and have a bad FCS.

FCS (Frame Check Sequence)These are the result of collisions at half-duplex, duplex mismatch, bad hardware(NIC, cable, or port), or connected device generating frames with bad FCS.

Xmit-ErrThis is an indication of excessive input rates of traffic. This isalso an indicationof transmit buffer being full. The counter should only increment in situationswhere the switch is unable to forwarded out the port at a desiredrate. Situationssuch as excessive collisions and 10 megabit ports will cause the transmitbuffer to become full. Increasing speed and moving link partner to full-duplexshould minimalize this occurance.

Rcv-ErrThis is an indication of excessive output rates of traffic. This isalso an indicationof the receive buffer being full. This counter should be zero unless there isexcessive traffic through the switch. In some switches, the outlostcounter has adirect correlation to the Rcv-Err.

UnderSize This is an indication of a bad frame generated by theconnected device.

Single CollisionsThis is an indication of a half-duplex configuration.

Multiple CollisionsThis is an indication of a half-duplex configuration.

Page 54: Solaris Real Stuff

Late CollisionsThis is an indicationof faulty hardware (NIC, cable, or switch port) or duplexmismatch.

Excessive CollisionsThis is an indication of over-utilization of switch port athalf-duplex or duplexmismatch.

Carrier SenseThis is an indication of faulty hardware (NIC, cable, or switch port).

RuntsThis is an indication of the result of collisions, duplex mismatch, dot1q, orISL configuration issue.

GiantsThis is an indication of faulty hardware, dot1q, or ISL configuration issue.

Additional Troubleshooting for 1000BaseX NICsGigabit Auto-Negotiation (No Link to Connected Device)Gigabit Ethernet has an auto-negotiation procedure that is moreextensive than what isused for 10/100 Mbps Ethernet (Gigabit Auto-negotiation spec: IEEE Std802.3z-1998).The Gigabit Auto-negotiation negotiates flow control, duplex mode, andremote faultinformation. You must either enable or disable link negotiation onboth ends of the link.Both ends of the link must be set to the same value or the link willnot connect.If either device does not support Gigabit auto-negotiation, disablingGigabit auto-negotiation will force the link up. Disabling auto-negotiation "hides"link drops and otherphysical layer problems. Only disable auto-negotiation to end-devicessuch as olderGigabit NICs that do not support Gigabit auto-negotiation. Do not disable auto-negotiation between switches unless absolutely required as physicallayer problems maygo undetected and result in spanning-tree loops. The alternative todisabling auto-negotiation is contacting the vendor for software/hardware upgrade forIEEE 802.3zGigabit auto-negotiation support.

Page 55: Solaris Real Stuff

posted by Brahma at 3:51 PM 0 comments

Network Configuration configure the driver

Network Configuration

This section describes how to configure the driver after it has beeninstalled on your system.To Configure the Host Files

After installing the Sun GigabitEthernet adapter driver software, youmust create a file for the adapter's Ethernet interface. You must alsocreate both an IP address and a host name for the Ethernet interfacein the /etc/hosts file.

1.

At the command line, use the grep command to search the/etc/path_to_inst file for ge interfaces.

For Sun GigabitEthernet/P:

The following example shows the device instance from an adapterinstalled in slot 1.

# grep ge /etc/path_to_inst"/pci@1f,4000/network@1" 0 "ge"

For Sun GigabitEthernet/S:

The following example shows the device instance from an adapterinstalled in slot 0.

# grep ge /etc/path_to_inst"/sbus@1f,0/network@1" 0 "ge"

2.

Create an /etc/hostname.ge<num> file, where num is the instancenumber of the ge interface you plan to use.

If you wanted to use the adapter's ge interface in the Step 1example, you would need to create a /etc/hostname.ge0 file, where 0 isthe number of the ge interface. If the instance number were 1, thefile name would be /etc/hostname.ge1.*

Page 56: Solaris Real Stuff

Do not create an /etc/hostname.genum file for a SunGigabitEthernet adapter interface you plan to leave unused.*

The /etc/hostname.genum file must contain the host namefor the appropriate ge interface.*

The host name should have an IP address and should beentered in the /etc/hosts file.*

The host name should be different from any other host nameof any other interface: for example, /etc/hostname.ge0 and/etc/hostname.ge1 cannot share the same host name.

The following example shows the /etc/hostname.genum filerequired for a system called zardoz that has a Sun GigabitEthernetadapter (zardoz-11).

# cat /etc/hostname.ge0zardoz# cat /etc/hostname.ge1zardoz-11

3.

Create an appropriate entry in the /etc/hosts file for eachactive ge interface.

For example:

# cat /etc/hosts## Internet host table#127.0.0.1 localhost129.144.10.57 zardoz loghost129.144.11.83 zardoz-11

4.

If your system does not support Dynamic Reconfiguration (DR), reboot.

posted by Brahma at 3:45 PM 0 comments

Page 57: Solaris Real Stuff

Ethernet Jumbo Frame Configuration Procedure

Ethernet Jumbo Frame Configuration Procedure

No configuration changes are required in the iSCSI driver or in theSolaris operating system to support jumbo frames. Instead,configuration changes must be made in the Network Interface Card(NIC), through the use of the configuration interface and tools thatare provided by the NIC manufacturer. Not all NICs support jumboframes, so check with your manufacturer to verify that this feature issupported. Also, the network equipment (Ethernet switches, routers,and so forth) between the host and the SN 5400 will have to beconfigured to accept jumbo frames, because most equipment does notsupport this capability by default.Performance Improvement Techniques

These are the most likely reasons that iSCSI network performance islower than expected:

*

Flow control has not been enabled on the NIC card in the host.*

Flow control has not been enabled on one or more of the switchesin the Ethernet network that is between the host and the SN 5400.

To see these problems, observe either the retransmit timeout or thedata packets retransmitted counters in the SN 5400. You can alsoobserve the TCP segments retransmitted counter: issue the netstat -scommand in Solaris.

Many NICs and switches are shipped with pause frames (flow control)disabled. If you enable pause frames on all Gigabit Ethernetinterfaces (the host system and all network switches), then you willhelp reduce dropped packets: a source of significant performancedegradation. By default, pause frames are enabled on the SN 5400 andare not user-configurable.

If your server uses a Sun Gigabit Ethernet adapter NIC, then use thisprocedure to enable flow control (by default, receive flow control isdisabled for the Sun adapter):

1.

Page 58: Solaris Real Stuff

Add this line to the /kernel/drv/ge.conf file (by default, thisfile does not exist):

adv_pauseTX=1;

2.

Reboot the Solaris host.3.

Issue this command to verify that adv_pauseTX is set properly:

ndd/dev/ge adv_pauseTX

The NIC should respond with a value of 1.

posted by Brahma at 3:44 PM 0 comments

Troubleshooting Difficulties Network

Troubleshooting DifficultiesDisabling autonegotiation can result in physical layer problems goingundetected.Link partner, cable problems, and other Data Link layer issues arehidden from theadministrator and manual examination of driver statistics is required.sUnable to detect bad cablessUnable to detect link failuressUnable to check link partners capabilitiessUnable to move systems from one port to another or to another switch or routersUnable to determine performance issues on higher layer applicationssUnable to implement Pause Frames (Flow Control)sDifficulties in determining where system has forced setting configured(/etc/system and driver.conf using ndd in startup scriptLink syncing between link partners may not happen and the link may not come upwhen autonegotiation is absent on 100BASE-T (UTP) copper.Example of hme interface with duplex mismatch:hme0 has negotiated and failed back to HDX and experiencing crc babble andlate_collisions.

Page 59: Solaris Real Stuff

Console> show port negotiation 4/1PortLink Negotiation----- ----------------4/1 enabled# kstat -p hme:0::'/collisions|framing|crc|code_violations|tx_late_collisions/'hme:0:hme0:code_violations0hme:0:hme0:collisions16720Page 1412Ethernet Autonegotiation Best Practices • July 2004Example of hme interface with duplex mismatch:hme1 is forced to FDX and experiencing framing, crc, and code_violation errors.Example of switch port with duplex mismatch:Port 11/22 has been forced to FDX (Full Duplex) but link partner is in HDX (halfduplex), resulting in FCS (Frame Check Sequence) and runt errors.Check that port negotiation is enabled on the switch:hme:0:hme0:crc 0hme:0:hme0:framing0hme:0:hme0:tx_late_collisions 5706# kstat -p hme:1::'/collisions|framing|crc|code_violations|tx_late_collisions/'hme:1:hme1:code_violations147hme:1:hme1:collisions 0hme:1:hme1:crc 283hme:1:hme1:framing8hme:1:hme1:tx_late_collisions 0Console> show port counters 11/22Port Align-Err FCS-Err Xmit-ErrRcv-ErrUnderSize----- ---------- ---------- ---------- ---------- ---------11/220572968000Port Single-Col Multi-Coll Late-Coll Excess-Col Carri-Sen RuntsGiants----- ---------- ---------- ---------- ---------- --------- --------- ---------11/22

Page 60: Solaris Real Stuff

00000 97653220Console> show port negotiation 4/1PortLink Negotiation----- ----------------4/1 enabledConsole> show port negotiation 4/1

posted by Brahma at 3:44 PM 0 comments

Strange fsck behaviour with alternated superblock

Subject: Re: Strange fsck behaviour with alternated superblock

JBR wrote:> Why is it asking "FREE BLK COUNT(S) WRONG IN SUPERBLK SALVAGE?" when doing> fsck with alternated superblock ?> Is this normal behaviour?

Yes, this is normal. The dynamic data is only updated (or onlyregularly updated) in the main superblock I think, so the other onesget stale pretty fast. Really, they're a repository of layoutinformation about the FS so you can find things like cgs &c inemergency.

--tim

> I'am using Solaris 9 on a V440 with a Sun Storedge 3310.> I have a strange behaviour with using fsck with a alternated supperblock:

> First make a filesystem:

> # newfs /dev/md/rdsk/d21> newfs: /dev/md/rdsk/d21 last mounted as /afs7> newfs: construct a new file system /dev/md/rdsk/d21: (y/n)? y> /dev/md/rdsk/d21: 1430192192 sectors in 65533 cylinders of 64> tracks,> 341 sectors> 698336.0MB in 13107 cyl groups (5 c/g, 53.28MB/g, 6528 i/g)> super-block backups (for fsck -F ufs -o b=#) at:

Page 61: Solaris Real Stuff

> 32, 109504, 218976, 328448, 437920, 547392, 656864, 766336, 875808,> 985280,> Initializing cylinder groups:> ............................................................................> ...> ............................................................................> ...> ............................................................................> ...> ........................> super-block backups for last 10 cylinder groups at:> 1429159104, 1429268576, 1429378048, 1429487520, 1429596992, 1429706464,> 1429815936, 1429925408, 1430034880, 1430144352,> Do normal fsck:

> # fsck -F ufs /dev/md/rdsk/d21> ** /dev/md/rdsk/d21> ** Last Mounted on /afs7> ** Phase 1 - Check Blocks and Sizes> ** Phase 2 - Check Pathnames> ** Phase 3 - Check Connectivity> ** Phase 4 - Check Reference Counts> ** Phase 5 - Check Cyl groups> 2 files, 9 used, 704125298 free (10 frags, 88015661 blocks, 0.0%> fragmentation)

> Mount de filesystem:

> # mount /afs7

> Umount without doing something on the filesystem:

> # umount /afs7

> Do fsck with alternated superblock:

> # fsck -F ufs -o b=109504 /dev/md/rdsk/d21> Alternate super block location: 109504.> ** /dev/md/rdsk/d21> ** Last Mounted on> ** Phase 1 - Check Blocks and Sizes> ** Phase 2 - Check Pathnames> ** Phase 3 - Check Connectivity> ** Phase 4 - Check Reference Counts> ** Phase 5 - Check Cyl groups

Page 62: Solaris Real Stuff

> FREE BLK COUNT(S) WRONG IN SUPERBLK> SALVAGE? y

> 2 files, 9 used, 704190842 free (10 frags, 88023854 blocks, 0.0%> fragmentation)

> ***** FILE SYSTEM WAS MODIFIED *****

> Again same fsck without mounting fs:

> # fsck -F ufs -o b=109504 /dev/md/rdsk/d21> Alternate super block location: 109504.> ** /dev/md/rdsk/d21> ** Last Mounted on> ** Phase 1 - Check Blocks and Sizes> ** Phase 2 - Check Pathnames> ** Phase 3 - Check Connectivity> ** Phase 4 - Check Reference Counts> ** Phase 5 - Check Cyl groups> 2 files, 9 used, 704190842 free (10 frags, 88023854 blocks, 0.0%> fragmentation)

> ***** FILE SYSTEM WAS MODIFIED *****

> Modified , without asking something!

> When doing normal fsck everything is ok, but when using alternated> superblock (makes no difference which one) it is always asking the> salvage> question. Makes also no difference if filesystem is mounted or not> mounted> between fsck's.

> Why is it asking "FREE BLK COUNT(S) WRONG IN SUPERBLK SALVAGE?" when> doing> fsck with alternated superblock ?> Is this normal behaviour?

yes, the alternate SB's are only updated during mkfs_ufs(1M),tunefs(1M) and growfs(1M) - they are not kept updated duringregular use as far as dynamic changes to allocations is concerned.

---frankB

Page 63: Solaris Real Stuff

posted by Brahma at 3:42 PM 0 comments

Veritas Volume Manager ! Changing Hostname

Veritas Volume Manager !

What are the effects of a *hostname change* on a serverwhich has all of it's drives including root/swapencapsulated by Veritas Volume Manager ??

I have had a worst experience with LSM ( Digital version ofVeritas Volume Manager ) very recently.

Gopala,

Veritas VM assigns ownership to objects based on hostname, if you changethe hostname, the system will no longer own the objects.

Ron

> Ron

If you forget to export the volume group, you can manually changethe volume group ownership to allow it to be reimported.

> > Any discussion on this will be highly appreciated.

It's not clear what the problem that you ran into was. Encapsulatedvolume groups aren't the cleanest part of Veritas volume manager.Worst case, don't start the volume manager, manually delete anythingyou can find of the encapsulated root volume group, restart the volumemanager and re-encapsulate the root volume group. Can't give youspecifics since I don't have a volume manager running currently.

If you had a stand alone boot disk w/ a copy of the volume manager onit, you could use it to change host ownership.

If it turns out that Veritas can't handle hostname changes, you couldhave always delete the root volume group before hand. Deleting volumemanager entities doesn't delete the actual data.

Offhand, based on what you haven't told us, that's all I can thinkof.

Page 64: Solaris Real Stuff

I have changed the host name of a machine running VM 2.6 (under Solaris 2.6)and nothing happens except that some processes are running with the old name,but all works fine.

Check out man vxdctl

The commannd is vxdctl init newname

vxdctl init new_name

To confirm the change (and you may want to save the file before thechange, just in case ...), look in /etc/vx/volboot(This is the name veritas is going to use during imports).

Good Luck.

Jeff Robinson / HDG

I> encapsulated by Veritas Volume Manager ??

None at all. VxVM doesn't care a whit what the hostname is.Now, if your hostid changes, you'll have licensing issues anddiskgroup attachment issues, but that's not what you asked.

K[Doug Hughes]

> None at all. VxVM doesn't care a whit what the hostname is.

Oh, if only that was true. VxVM didn't find the A1000 until wechanged the hostname into non-FQDN. Now, why it cares is a mystery.I wouldn't take any chances with that software...

> diskgroup attachment issues, but that's not what you asked.

Changing the hostname will not affect the volumes you've previouslyinstalled unless youissue a vxdctl init command which will change the output of vxdctllist (hostid in there) this will cause VxVM tothink the disks areowned by someone else.

<br>Now, if your hostid changes, you'll have licensing issues and<br>diskgroup attachment issues, but that's not what you asked.</blockquote>

Page 65: Solaris Real Stuff

<p><br>Changing the hostname will not affect the volumes you've previouslyinstalled unless you<br>issue a vxdctl init command which will change the output of vxdctllist (hostid in there) this will cause VxVM to think the disks are<br>owned by someone else.<> I wouldn't take any chances with that software...

That is a problem with rm6, the software that manages the A1000, notwith VxVM.

I believe that the latest rm6 fixes this, not 100% sure though.--

Actually what is happening is:As I said it is called LSM ( logical storage Manager,digital VxVM )I start the server in single user mode, then I can see thecontents of the /etc/vol/tempdb directory ( which containsall the disk groups listed ). Once I start the Veritasservices it locks out this directory. And every servicedepends on the contents of this directory. Thus allDiskgroups disappear from the picture, including vold ( onSun VXLD )daemon which is needed by all services and toolson veritas.My question is: Did it ever happen to anyone, wherein somedirectories/file systems, encapsulated by VxVM, are notaccessible in *run level 3* and are accessible in *runlevel 1* or when mounted as ReadOnly ??

I

posted by Brahma at 3:42 PM 0 comments

Effective throughput can I expect on a Gigabit Ethernet link with UltraSparc-3 CPU

As there were many various interesting answers, I will quote all of them.Basically, I could expect near 100 Mbytes/s with Jumbo Frames under SunFire(This does not take the disk backend into account, just memory-to-memorytransfer speaking...).Thank you all.

--------------------------------------------------------------Joe Fletcher's answers:On a V880 8x900 we did some basic tests using ftp which gave us about

Page 66: Solaris Real Stuff

45MB/s.This put about a 10-15% overhead on the machine (ie it takes about a wholeUltraIII cpu to drive the card in any serious sense). This is dumping datafrom an FC array down separate HBAs to another array volume....Just checked some old results from another site I used to run. Probably notvery interesting to you but on an Alpha ES40 4xEV6 serving a group of Intelclients we managed to get about 80MB/s. The Alpha was linked into a 3COMswitch via gigabit with the clients each on a 100FD port on the same switch.Each client was tranferring a different set of files, some via ftp, somevia the SMB server software (ASU). We could get similar results using twoAlphas with memory filesystems mounted which allowed us to get the storageout of the picture. Not representative of real world particularly but wejust wanted to see how fast it was capable of going. I suspect the filecaching helped quite a lot where the PC clients were concerned.--------------------------------------------------------------Christophe Dupre's answer:What ethernet card do you have in your server ? Sun has at least twochipsets used in gigabit cards: GEM (with interface as ge0) and Cassini (sungigaswift, ce0 interface).The GEM is older and pretty much all the processing is done by the CPU, andthe throughput isn't that great. The Cassini is much better and offload someprocessing (IP CRC and TCP CRC) to the card, yielding much betterthroughput.Note that GEM is only 1000BaseSX, while Cassini does both fiber and copper.What do you use to compute the throughput ? I use iperf and between twoservers (both ultrasparc -2 400MHz, both dual CPU), both having GEM-basedcards connected to a Cisco 4506 switch I get 85Mbit/s for a singleconnection, and an aggregate of 94Mbit/s with about 40% kernel timeaccording to top. This is using an MTU of 1500 (the GEM and Cisco switchdon't do jumbo frames). The TCP Window size was 64KByte.By comparison, iperf runs between a Sun ultraSPARC3 with a Sun gigaswift anda Dell PowerEdge 2650 with a Broadcom 1000TX card connected using the sameCisco catalyst and 48KByte TCP windows yield 480MBit/s.So before upgrading the CPU you should make sure you have a card thatoffloads the CPU like the gigaswift. Next, jumbo frames don't matter much -support is not standardized, not much equipment supports it, and you can getpretty good performance without.I'm not sure how much the CPU speed is needed, though. I'll install agigaswift in an ultraSPARC2 soon, I can tell you the performance differencethen.--------------------------------------------------------------Jason Santos's answer:I would suspect that your bottleneck on the E10K would be the SBUSinterface, not CPU speed. With a gem or GigaSwift PCI card in a 750MHz6800, we get about 60MB/s over NFS with a single thread. Raw UDP or TCP

Page 67: Solaris Real Stuff

throughtput would be much higher, although I never tested it.Let me test now, stand by...This is a quick test from a 4x750MHz 6800 to a 4x1200MHz V880 (no networktuning, single thread):ttcp-t: buflen2768, nbuf 48, align384/0, portP01 tcp ->nbmasterttcp-t: socketttcp-t: connectttcp-t: 1073741824 bytes in 23.59 real seconds = 44441.28 KB/sec +++ttcp-t: 1073741824 bytes in 23.16 CPU seconds = 45275.30 KB/cpu secttcp-t: 32768 I/O calls, msec/call = 0.74, calls/sec = 1388.79ttcp-t: 0.1user 23.0sys 0:23real 98% 0i+0d 0maxrss 0+0pf 3756+261cswttcp-t: buffer address 0x74000

The fastest Gigabit transfers I have ever seen were from an IBM x345 (dualIntel Xeon 2.4GHz) over NFS to a NetApp FAS960, I was able to get over100MB/sec, which is 80% of the theoretical max of 125MB/sec.--------------------------------------------------------------Paul Theodoropoulos's answer:Sun's 'Rule of Thumb' from the UltraSPARC II era was that you should have300Mhz of ultraSPARC II horsepower per gigabit adapter. That's 'dedicated'horsepower - if you had one 300Mhz cpu and one gigabit adapter, you'd haveno horsepower to spare for your applications. In practice of course, thegigabit gets throttled down and the horsepower shared. But i would expectapproximately the same performance requirements with ultrasparc III,frankly.--------------------------------------------------------------Alex Madden's answer:http://www.sun.com/blueprints/0203/817-1657.pdf<http://www.sun.com/blueprints/0203/817-1657.pdf>--------------------------------------------------------------JV's answer:

#2) Throughput may depend more on the underlying storage architecture'sability to READ. You will get better with Hardware RAID 0/1 than softwareRAID like Disksuite or VXVM.#3) copper or optical gigE? I use optical, but I just got v240s last monthso I am beginning to experiment with their ce interfaces. #4) On opticalge, with 14 column Veritas stripes, on large-ish dbf files (1.5-2GB),6x336Mhz cpus, I can get 45 MB/sec with 35% sys. I haven't had a chance totune and test my 10-12 cpu UltraS-II (optical) or 2 cpu UltraS-III v240(copper ce) boxes.

--------------------------------------------------------------Tim Chipman's answer:You might want to use " ttcp " utility to test tcp bandwidth throughput. It

Page 68: Solaris Real Stuff

is more likely to represent " best case scenario " throughput that isin-keeping with statements like " gig-ether can do 100Mbytes/sec " :-)we did a bit of testing here a while back, and I'm appending the info belowas a general reference, for what use it may be.test boxes were,

athlon MP running either Solaris x86 OR linuxultraSparcII running solaris8

note, based on my experience, it seems unlikely you will ever get " realworld data xfer " much above 50-55Mbytes/sec over gig-ether. " ttcp "benchmarks are one thing, but real-world protocols are another.NOTE: testing done here using two dual-athlon systems, identies as follows:wulftest = redhat 8 (dual-1800mhz, 1 gig ram, 64-bit PCI)wulf2 = redhat 8 (dual-2000mhz, 1 gig ram, 64-bit PCI)thore = solaris8x86 (dual-2000mhz, 1 gig ram, 64-bit PCI)(note - Wulf2 & Thore are actually the same system with 2 different HDDs toboot the alternate OS'es)ultra5 = 270mhz Ultra5 (nb, 32-bit PCI bandwidth only)

Gig-ether NICs being tested are all 64-bit PCI / Cat5 cards:Syskonnect SK-98213Com 3C996B-T (BroadCom chipset)

(note, we had 2 x SK nics and 1 x 3com on-hand, so didn't test 3com<->3comperfo rmance.)

Software being used for testing was (1) TTCP and (2) Netbackup (for info onTTCP, visit the URL: http://www.pcausa.com/Utilities/pcattcp.htm<http://www.pcausa.com/Utilities/pcattcp.htm> )

Parameters tuned include Jumbo Frames (MTU of 1500 vs 9000) ;combinations of NIC<-> NIC and system<->system

Connection between NICs was made with a crossover cable, appropriately wired(al l strands) such that Gig-ether was operational.

Note these ##'s are NOT " comprehensive ", ie, NOT every combination oftuneable p arameters has been attempted / documented here. Sorry about that.

Hopefully, " something is better than nothing ".

[TTCP results]SysKonnect <-> SysKonnect = 77 MB/s

Page 69: Solaris Real Stuff

* Wulftest with Syskonnect (Redhat 8)* Thore with Syskonnect (Solaris x86)* Jumbo frames don't affect speed,but offload the systems by around 20-40% for CPU loading.SysKonnect <-> 3COM = 78 MB/s* Wulftest with Syskonnect (Redhat 8)* Wulf2 with 3com (Redhat 8)* MTU = 1500

SysKonnect <-> 3COM = 97 MB/s* Wulftest with Syskonnect (Redhat 8)* Wulf2 with 3com (Redhat 8)* MTU = 9000

ULTRA5 <-> Wulftest tests with TTCP:(SysKonnect <-> Syskonnect NICs)with JumboFrames:* 25% CPU load on Ultra5, 29 MB/s

without JumboFrames:* 60% CPU load on Ultra5, 17 MB/s

[Netbackup results]Large ASCII file (5 gigs) = 50 MB/s* Wulftest with SysKonnect (Redhat 8)* Thore with 3COM (Solaris x86)* MTU 1500

System backup (OS files, binaries) = 11 MB/s* Wulftest with SysKonnect (Redhat 8)* Thore with 3COM (Solaris x86)* MTU 1500

ORIGINAL QUESTION :

Hi,

Basic question is : What effective throughput can I expect on a GigabitEthernet link with UltraSparc-3 CPU, with or without Jumbo Frame support,with or without multithreaded transfer ?

I ask this because with UltraSparc-2 CPU (E10K) and GE link (without JumboFrame support) we couldn't get more than :- 15 Mbytes/s with monothreaded transfer- 55 Mbytes/s with multithreaded transfer (the best rate was reachedwith 10 threads)

Page 70: Solaris Real Stuff

(We measured application throughput, that is to say TCP throughput).

As you see the CPU overhead with 1500 MTU was so high (truss showed 80%kernel), that we had to multithread the transfer to reach the bestthroughput (55 Mbytes/s).Unfortunately we were far from the theoretical limit (100 Mbytes/s ?), evenif there were still CPU resources free (50%), and I can't determine if itwas caused by the small MTU, the poor US-2 throughput or both ?

I think that Jumbo Frame could increase the throughput and lower the CPUoverhead, but how much ?Will the US-3 throughput help much ?Is there any chance to reach the 100 Mbytes/s limit ?

Thanks for your feedback, I will summarize.---Sebastien DAUBIGNE

posted by Brahma at 3:38 PM 5 comments

Tuning the Sun Gigabit Ethernet Adapter

Appendix B: Tuning the Sun Gigabit Ethernet Adaptera. Create a file named /etc/rc2.d/S99netperfb. Add the following lines:i./usr/sbin/ndd -set /dev/ge instance 0ii./usr/sbin/ndd -set /dev/ge adv_pauseTX 1iii./usr/sbin/ndd -set /dev/ge adv_1000autoneg_cap 0iv./usr/sbin/ndd -set /dev/ge adv_1000fdx_cap 1c. If you have more than one "ge" adapter on your server, you caneither omit item b.i. andthe settings will affect all instances, or specify the instance youwant to refer to

posted by Brahma at 3:37 PM 0 comments

Disk Failure 880

A number of failures on a new machine with a number of disks is quitecommon. If this is a two disk V880, I'd say you have a problem. If itsa 12 disk V880, I'd call Sun, swap out the disk and shrug it off.

Page 71: Solaris Real Stuff

If you look at the probability of disk failure over time, there is aninitial peak as new disks spin in, then several years of lowprobability, then a ramping as they get towards the end of servicablelife.

directio shouldn't make a difference. However, I would check theambient temperature inside and around the machine (emphasis ontemperature stability rather than value), ensure that it not beingbumped or subject to (excess) vibration, note the humidity of the roomand check for dust.

posted by Brahma at 3:36 PM 0 comments

High Kernel Usage

Bernd Haug <[email protected]> wrote:> [email protected] <[email protected]> wrote:>> TOP shows Kernel has been using over 55% CPU.>> How can we find what are those kernel's processes? why do those use a>> lot of CPU? Normally i only see about 10%.> Wouldn't spending some time in syscalls be pretty normal on a DB> server?

Not that much.

> I rather think it's funny that you have 0 iowait.> Maybe something went wrong and now your top conflates the two into> kernel time?

Remember that IOwait is only a subset of idle time. If you have no idletime, then you can't display any IOwait (even if the disks are slow).

I'd look to see if the "last PID" is increasing rapidly. A constantlyforking program that creates new processes which are quickly reaped willeat up tons of system time. Difficult to trace directly (well, withoutSolaris 10 and dtrace).

Take some guesses and run 'truss' looking for forks.

posted by Brahma at 3:36 PM 0 comments

corelation between %iowait from top and %busy or avwait from sar -d

MikeHT <[email protected]> wrote:> I would like to know from the list if there is any corelation between> %iowait from top and %busy or avwait from sar -d # #?

Page 72: Solaris Real Stuff

No, not necessarily.

> Do it mean if> %busy is high, then the %iowait> will be high too?

No.

%busy is a description of what's happening on a disk (or the drivertalking to the disk). %iowait is a description of what's happening onthe CPU.

In some cases they may be correlated, but that's not necessarily true.There are many cases where they would not be.

> But that corelation is not consistent based on the> data captured from> machines A & B. Machine B's devices's %busy is high, but %iowait is> lower.

Right.

> How is %iowait calculated in top? Any insights? Thanks.

%iowait is the average fraction of time that the CPU is busy and thatthe system has at least one outstanding I/O request. The outstandingI/O may or may not have anything to do with your %busy disk.

Note that a system may be completely I/O swamped while no individualdisk is all that busy. Also, you could have a disk that shows 100%busy, but it is meanwhile able to keep up with requests very rapidly.

So these numbers may not mean a lot in isolation. They're most usefulwhen looking at changes over time, or relationships between components.

Note the iowait is a subset of 'idle'. If you can keep the CPUs busy byworking them harder, they'll never display iowait. That's a majordifference between A and B below. B may be more swamped while at thesame time doing more CPU jobs (so that it has less idle time).

--

Thanks Darren for the reply.

Page 73: Solaris Real Stuff

High %iowait could also indicate that path to the disk devices (EMCsymmetric in this case) could be problematic? We are using EMCpowerpath.

Reply

MikeHT <[email protected]> wrote:> Thanks Darren for the reply.> High %iowait could also indicate that path to the disk devices (EMC> symmetric in this case) could be problematic?

It *could*, but it probably doesn't.

%iowait is a very visible number that doesn't often indicate a problem.A well-tuned, perfectly reasonable machine may still have somewhatelevated iowait figures during normal operation. Many admins try to"fix" it when they shouldn't. (Note that because of this and a fewother reasons, Solaris 10 apparantly does not track this any longer, soit will always appear as zero).

Look for problems by analyzing performance, not by looking at iowait.Does the application work? Does it respond in good time? Are thethroughput and latency figures on your storage what you expect?

The iostat figures are much more likely to be relevant than the systemiowait number.

posted by Brahma at 3:35 PM 0 comments

/var fills when using pkgadd... query

/var fills when using pkgadd... query

soplaris 8 on a sparc 20 (!)

i have a psarc 20 that will do a very nice job for me - I just need toinstall samba on it. It has a 1 Gb disk that is carved up (historicalinstallation).When I attempt to install the samba package it fails becaus ethe fairlysmall /var partition fills.

I added a 4 Gb disk I had jandy and created a 4gb parttion (/data) toplay with... I have shelved off varuious areas of /var to /data toleave more sopace in /var but even this doesn't solve the matter.

Page 74: Solaris Real Stuff

So - the crux of my query. Although samba is being installed into acompletely seperate parttion to /var, what is it in /var that isfilling (ie being used?). I've hived off /var/spool/pkg to /datautilising a soft link, but that doesn't solve it.

So

1) what is being used in /var anyway

2) is there anything I can do with my pkgadd to avoid use of /var atall? (-R doesn;t solve it for instance...)

> 1) what is being used in /var anyway

/var/tmp

> 2) is there anything I can do with my pkgadd to avoid use of /var at> all? (-R doesn;t solve it for instance...)

set $TMPDIR to a location where you have an adequate amount of space.

See pkgadd(1M), 2d paragraph under DESCRIPTION.

Reply

>utilising a soft link, but that doesn't solve it.

/var/sadm

Casper

posted by Brahma at 3:34 PM 0 comments

ufsdump warning

> lolo wrote:>> When a make ufsdump>> /usr/sbin/ufsdump 0uf Server1:/dev/rmt/On /export/home

>> I have this message

>> DUMP: Dumping (Pass IV) [regular files]>> DUMP: Warning - block 1543503872 is beyond the end of `/dev/md/rdsk/d3'> that's a typical error when you try to backup a filesystem that is> active. Either unmount /export/home or check "man fssnap"

Page 75: Solaris Real Stuff

I've never seen that error on simply an active filesystem.If the filesystem is larger than d3, you can get that message.

To the OP, can you post 'metastat d3', and 'fstyp -v /dev/md/rdsk/d3 |grep size', and prtvtoc info for the disk or disks holding d3?

posted by Brahma at 3:33 PM 0 comments

Password Aging, Part 2

Password Aging, Part 2

If you're starting with a group of users who have been active for along time and not had their passwords aged, how should you go aboutintroducing password aging?

To start, you might first take a look at the dates on which yourusers' passwords were last changed. To view the dates by themselves,you might use a command such as this (run as root):

# cat /etc/shadow | awk -F: '{print $3}' | sort -n | uniq -c

This command sorts the lastchg (last time the password was changed)field numerically and prints out the number of records with eachparticular date value.

Of course, the dates in this command's output are going to bepresented to you as a list of numbers (rather than recognizabledates). You will see something that looks more or less like this:

7 64451 112892 1163253 116765 116772 116831 118492 1203823 123451 128811 13062

These numbers are a little hard to interpret, but the range of valuesand the "popular" values suggest that most users on this system havenot changed their passwords in a very long time and that many of themmight have last changed their passwords in response to a request to do

Page 76: Solaris Real Stuff

so (since two groups of people changed their passwords on the same twodays).

But let's try to pin these numbers down and get an idea what dates weare really looking at. How do you do this? Well, if you have the GNUdate command installed on your system, you can view today's date witha command such as this:

% expr `date +%s` / 86400

Alternately, you can package this date conversion command in in ascript such as that shown below, call it "today" and run it wheneveryou want to know what the current date looks like in thedays-since-the-epoch format. If you're reading this column on the daythat it was first published, that value would be 13062.

#!/usr/bin/perl -w# today: a script to print date in days-since-epoch format

$now=`/usr/local/bin/date +%s`;$_=$now / 86400;($today)=/(\d+)./; # number of days since 01/01/1970

print "$today\n";

In both the command and the "today" script, we use the "date +%s"command to produce the current date/time as the number of secondssince midnight on January 1, 1970. We then divide this value by thenumber of seconds in a day (86,400) to convert this value to thenumber of days since January 1, 1970. The commented line lops off thedigits on the right side of the decimal point (along with the decimalpoint itself). This gives us a value for today.

To determine how long ago one of the other dates in the lastchg listabove happened to be, we can use an expr to calculate the number ofdays between today and the date the password was last changed. Let'schoose the most popular value (line 4) for this:

# expr 13062 - 116761386

That's 1,386 days ago -- nearly four years! NOTE: The shadow recordswith 6445 in the lastchg field are disabled accounts and, thus, don'tfactor into our password aging concerns.

Page 77: Solaris Real Stuff

If the bulk of your users have the same last-set date, they haveprobably never changed their passwords -- or never changed them sincethey were last required to do so. Whenever you change a user'spassword or one of your users changes his own password, that field inthe /etc/shadow file will be updated.

So, how do you introduce password aging in a situation such as this?If you add a max value when a user's password hasn't been reset fornearly four years, chances are that his password will already beexpired and he will not be able to log in.

A better approach would be to initiate password aging by modifying thelastchg date in your shadow records and then selecting a max valuethat will give your users time to change their passwords before theyrun out of time. You should also publish notices explaining the changeand focusing your users attention on the need to change theirpasswords from time to time.

For example, if you make the lastchg date of a record five months inthe past and then require that the user change his password every sixmonths, this would give him a month to change his password before heis locked out. And, from that point forward, he would need to changehis password every six months.

Five months in the past would roughly put the (fictitious) lastchgdate at 12912 (13062 - (5 * 30)). A shadow entry such as that shownbelow would, therefore, force sbob to change his password within themonth and would give him a month's worth of warnings before he'slocked out of his account:

sbob:dZlJpUNyyusab:12912:30:180:30:::

On login, sbob would see something like this:

Your password will expire in 30 days.Last login: Wed Oct 6 16:28:34 2005 from corp.particles.comSun Microsystems Inc. SunOS 5.8 Generic February 2005

If you've never used password aging before, it's probably a good ideato get your users' attention to the fact that passwords are going toexpire. The one-line warning above may not be enough to get yourusers' attention. Perhaps a notice like this in your /etc/motd filewould be more effective:

Page 78: Solaris Real Stuff

>>> Passwords must be changed every 6 months <<<>>> Look for password expiration information <<<>>> in the system output above <<<

When a message like this is displayed on login for a month, your usersare likely to notice and take action before their passwords expire.

You can also change the default settings for password aging in the/etc/default/passwd file. For example, if you want users to berequired to keep a password for a month and change it every 6 months,your values might look like this:

MAXWEEKS=26 MINWEEKS=4 PASSLENGTH=6

Next week, we will look at a script that analyzes the aging parametersin the /etc/shadow file and warns you about users who are gettingclose to their password expiration dates.

posted by Brahma at 3:33 PM 0 comments

Disk performance

UNIX Museum wrote:> Hi there,

> I am trying to understand why my SB1K gives me abysmal performance on> sequential read/write...

> So, it appears to me that the culprit is the scratch filesystem... I> tried to mount the drive with forcedirectio, but it looks like it made> things even worse... Does anybody have any suggestion?

Yepp, forcedirectio is for databases that do their own caching.

If you are doing verylargefiles you should :

1. make the filessystem with fragsize set equal to blocksize2. set cylinder group size to 32.3 set the "maxbpg" to 50% or 99% of the amount of blocksin a cylinder group

Look at fstyp(1M) and tunefs(1M) and newfs(1M)

//Lars

posted by Brahma at 3:32 PM 0 comments

Page 79: Solaris Real Stuff

Apache installation

Greetings,>> I know this is a Solaris forum, but I'm having problems installing> apache on Solaris 9; maybe someone on solaris-l has experience> installing apache. I've done an exhaustive search using Google; no> useful information was returned. Listed below are the system specifics:>> Apache version: 2.0.48> Compiled Apache source using gcc 3.4.4> OS: Solaris 9>> I used the following commands to compile the apache source:>> 1. ./configure --prefix=/mp1/software/apache-2.0.48 --enable-module=so> 2. make> 3. make install>> The above three steps completed without error. The configure option of> "--enable-module=so" was listed in my installation guide (not apache> doc.) as being necessary.>> When I attempt to start apache by using, "apachectl start", the> following error is returned:>> Illegal instruction - core dumped>> The error is being generated when attempting to run the "httpd" command.> All of the other commands in $APACHE_HOME/bin load and run without> errors.>> Any help to get apache up and running would be greatly appreciated.>> Mike Badar

Re: Apache installationPosted By Luis Hansen

Im using apache 2.054 with Solaris 9I compiled:/configure --enable-module=so --enable-module=all --with-mpm=worker--enable-shared=max --prefix=/usr/local/apache2054

with mpm=woker apache works in fork & thread ,mpm its an hybrid.

Page 80: Solaris Real Stuff

Check if in /usr/local/lib if some library is missing

the user must be nobody as group too in the httpd.conf file

check your configuration with ./apachectl configtest

regards

posted by Brahma at 3:32 PM 0 comments

Replacing Multiple Failed Disks on StoreEdge T3 Array

Replacing Multiple Failed Disks on StoreEdge T3 Array

You had a double failure, the data is lost, so the system cant reenable iton its own, the simplest is to remove the volume and rebuild it so that itcan create the parity correctly.

With a single failure it would have recovered on its own when you replacedthe drive, if it doesnt you recover it with

vol recon <drive> [from Standby]

if you didnt have a standby it can recover anyway, it just takes longer.

Also see to that you have the latest firmware for the array and that nodisk in the log has given the "disk error 03" code which might indicatethat they will be kicked out of config during an upgrade of firmware orvery soon after.

running the command

vol verify <volume>

once a week is a good practice since it will test for read failureshopefully before you have double ones, check the log for the test result.

posted by Brahma at 3:31 PM 0 comments

Cisco Counters meaning

Counters

* Packets input - Total number of error-free packets received.* Broadcasts - Total number of broadcast or multicast packets received.* Runts - Number of packets discarded because they are smaller

Page 81: Solaris Real Stuff

than the medium's minimum packet size.* Giants - Number of packets that are discarded because theyexceed the medium's maximum packet size.* Throttle - This counter indicates the number of times the inputbuffers of an interface have been cleaned because they have not beenserviced fast enough or they are overwhelmed. Typically, an explorerstorm can cause the throttles counter to increment. It's important tonote that every time you have a throttle, all the packets in the inputqueue get dropped. This causes very slow performance and may alsodisrupt existing sessions.* Parity - Number of parity errors on the HSSI.* RX Disabled - Indicates inability to get a buffer when accessing a packet.* Input Errors - Sum of all errors that prevented the receipt ofdatagrams. This may not balance with the sum of the enumerated outputerrors, because some datagrams may have more than one error and othersmay have errors that do not fall into any of the specific categories.* CRC - Cyclic redundancy checksum generated mismatch. CRC errorsalso are reported when a far-end abort occurs and when the idle flagpattern is corrupted. This makes it possible to get CRC errors evenwhen there is no data traffic.* Frame - Number of packets received incorrectly having a CRCerror and a noninteger number of octets.* Overrun - Number of times the serial receiver hardware wasunable to hand received data to a hardware buffer because the inputrate exceeded the receiver's ability to handle the data.* Ignored - Number of received packets ignored by the interfacebecause the interface hardware ran low on internal buffers.* Abort - Number of packets whose receipt was aborted.* Bytes - Total number of bytes, including data and MACencapsulation, transmitted by the system.* Underruns - Number of times that the far-end router'stransmitter has been running faster than the near-end router'sreceiver can handle. This may never happen (be reported) on someinterfaces.* Congestion Drop - Number of messages discarded because theoutput queue on an interface grew too long.* Output Errors - Sum of all errors that prevented the finaltransmission. This may not balance with the sum of the enumeratedoutput errors, because some datagrams may have more than one error andothers may have errors that do not fall into any of the specificcategories.* Interface Resets - Number of times an interface has been completely reset.* Restarts - Number of times the controller was restarted because of errors.* Carrier Transitions - Number of times the carrier detect signalof a serial interface has changed state.

Page 82: Solaris Real Stuff

posted by Brahma at 3:31 PM 0 comments

solaris 9 ssh ahangs on exit ?

solaris 9 ssh ahangs on exit ?

On solaris 9, recommended patches as of Dec 5th applied.

On occasion, exiting from an ssh session to another machine hangs. Is therea fix for that ? We're using solaris' ssh.

Thanks

> On occasion, exiting from an ssh session to another machine hangs. Is there> a fix for that ?

It usually happens when you start a background process and it doesn'tproperly close its stdin/out/err. Adding >/dev/null and 2>&1 to commandline usually helps. Keyword here is "usually" -- if you run a programthat does something funny, like piping its stdout to child's stdinwith dup(), all bets are off.

The real problem is in SSH protocol and that can't be fixed inimplementation. So the real fix is to not use ssh.

Dima--

>The real problem is in SSH protocol and that can't be fixed in>implementation. So the real fix is to not use ssh.

So what *do* you use, if, say, you want to go from yourcable modem connection and log into the "shell account"you have on an isp? Everyone says that telnet way, way toodangerous, too easy to hack into.

Thanks

> So what *do* you use, if, say, you want to go from your

You use ssh. DM was being pedantic in saying that because theprotocol is broken then the only "fix" is not to use it -- unless youfancy improving the protocol. On the other hand, I'm guessing thatssh handles all of your needs bar the occasional hung session. Jointhe club.

Page 83: Solaris Real Stuff

Read the section on Escape Characters in the manual and when yoursession hangs on exit apply:

[RETURN] ~ &

or

[RETURN] ~ .

as appropriate. Noting, of course, that if you've ssh'd throughseveral boxes you may need to increase the number of escape chars.

Cheers,

Ian

> the club.

Yep. Interactive sessions are not a problem as there's a humanthere to press enter-tilde-dot if it doesn't close. It's whenyou run it from e.g. cron is when you should worry.

Dima--Things seemed simpler before we kept computers. -- IX, Revelation

posted by Brahma at 3:30 PM 0 comments

Solaris 9 sshd / ssh exit codes

Solaris 9 sshd / ssh exit codes

On Solaris 9, with the latest ssh/sshd patches installed ...

114356-03: SunOS 5.9: /usr/bin/ssh Patch113273-06: SunOS 5.9: /usr/lib/ssh/sshd Patch

... why is there inconsistent behaviour with ssh exit codes? Why arethere sporadic exit codes of 0 from the following ssh command?

% repeat 100 sh -c 'ssh -n localhost "exit 33" ; echo $?'3333333333

Page 84: Solaris Real Stuff

333333333333333333333333333333330333333333300003333333333333333333333^ZSuspended

Juergen Keil <[email protected]> writes:>On Solaris 9, with the latest ssh/sshd patches installed ...> 114356-03: SunOS 5.9: /usr/bin/ssh Patch> 113273-06: SunOS 5.9: /usr/lib/ssh/sshd Patch>... why is there inconsistent behaviour with ssh exit codes? Why are>there sporadic exit codes of 0 from the following ssh command?

Page 85: Solaris Real Stuff

It's a bug which is being addressed.

(There's a race condition between processes exiting and stdin/stdoutbeing closed, so sshd can close the connection before it has seenthe exit status)

Casper

>>... why is there inconsistent behaviour with ssh exit codes? Why are>>there sporadic exit codes of 0 from the following ssh command?

> It's a bug which is being addressed.

> (There's a race condition between processes exiting and stdin/stdout> being closed, so sshd can close the connection before it has seen> the exit status)

Is that the same bug that causes my CVS over ssh sessions to hang?

Dragan

->0>33>33>^Z>Suspended

Obviously your "localhost" is round-robin DNS to hosts with differentdefinitions of "exit" ! Have you had a visit from my SAs ?

Is it more informative to do "uname ; exit 33" to let you seewhether the remote command got run ?

>Is that the same bug that causes my CVS over ssh sessions to hang?

It happens mostly on Linux clients.... Here is what I do for my Starbackup script:

_cmd=` eval echo "$STAR -cM -time f=$RTAPE VOLHDR= $host:$i$STAR_ARGS $STAR_EXTRA -C $i . \; r=\$\? \; sleep 10 \; exit \$\r"`ssh -q "$host" -l root "$_cmd"excode=$?if [ $excode -ne 0 ]; thenecho "----> EXICODE: $excode for $host:$i"fi

Page 86: Solaris Real Stuff

posted by Brahma at 3:29 PM 0 comments

sh not ending (sometimes) from inside a script, why?

ssh not ending (sometimes) from inside a script, why?

Okay I have this script file that runs from cron job (on a unix boxrunning solaris 9 with SSH version Sun_SSH_1.0 protocols 1.5/2.0) andmost the time it works just find. Except every so often one of thethree ssh commands I have in the script just doesn't know it's doneand that of course causes the whole thing to hang!

The ssh command has executed. I can tell this because the command isas follows:

ssh username@hostname "ls /somedirectory" >somedir.file

and a new somedir.file has been created on the machine running thecron script.

Also, when this happens I can use the "kill sshprocessid" command andthe cron script will continue, fat dumb and happy!

The machine I'm "ssh"ing to is a unix box with OpenSSH_3.7.1p2protocols 1.5/2.0, OpenSSL 0.9.6 on it.

I have tried running the ssh command in -v mode and looking at the output but can't really make much from it but can see a difference in theorder of things when the command hangs.

The –v output when ssh ended properly is as follows:

1 -debug1: Entering interactive session.2 - debug1: client_init id 0 arg 03 - debug1: Sending command: ls /Rawdata/Archive2/*VCID4.tlm.gz4 - debug1: channel 0: open confirm rwindow 0 rmax 327685 - debug1: channel 0: read<=0 rfd 6 len 06 - debug1: channel 0: read failed7 - debug1: channel 0: input open->drain8 - debug1: channel 0: close_read9 - debug1: channel 0: input: no drain shortcut10 - debug1: channel 0: ibuf empty11 - debug1: channel 0: input drain->closed12 - debug1: channel 0: send eof13 - debug1: channel 0: rcvd eof14 - debug1: channel 0: output open->drain

Page 87: Solaris Real Stuff

15 - debug1: channel: 0 rcvd request for exit-status16 - debug1: cb_fn 267a4 cb_event 9117 - debug1: channel 0: rcvd close18 - debug1: channel 0: obuf empty19 - debug1: channel 0: output drain->closed20 - debug1: channel 0: close_write21 - debug1: channel 0: send close22 - debug1: channel 0: full closed223 - debug1: channel_free: channel 0: status: The followingconnections are open: #0 client-session (t4 r0 i8/0 o128/0 fd -1/-1)

24 - debug1: channel_free: channel 0: dettaching channel user25 - debug1: Transferred: stdin 0, stdout 0, stderr 0 bytes in 0.6seconds26 - debug1: Bytes per second: stdin 0.0, stdout 0.0, stderr 0.027 - debug1: Exit status 0

The –v output when ssh hung is as follows:

1 - debug1: Entering interactive session.2 - debug1: client_init id 0 arg 03 - debug1: Sending command: ls /Rawdata/Archive2/*VCID4.tlm.gz4 - debug1: channel 0: open confirm rwindow 0 rmax 327685 -debug1: channel 0: read<=0 rfd 6 len 06 - debug1: channel 0: read failed7 - debug1: channel 0: input open->drain8 - debug1: channel 0: close_read9 - debug1: channel 0: input: no drain shortcut10 - debug1: channel 0: ibuf empty11 - debug1: channel 0: input drain->closed12 - debug1: channel 0: send eof13 - debug1: channel 0: rcvd eof14 - debug1: channel 0: output open->drain15 - debug1: channel 0: obuf empty16 - debug1: channel 0: output drain->closed17 - debug1: channel 0: close_write18 - debug1: channel 0: send close19 - debug1: channel: 0 rcvd request for exit-status20 - debug1: cb_fn 267a4 cb_event 9121 - debug1: channel 0: rcvd close22 - debug1: channel 0: full closed223 - debug1: channel_free: channel 0: status: The followingconnections are open: #0 client-session (t4 r0 i8/0 o128/0 fd -1/-1)

24 - debug1: channel_free: channel 0: dettaching channel user

Page 88: Solaris Real Stuff

Note I put the numbers on the trace.

I figure there is a race condition of some kind going on for the endof command signal and once in awhile it gets missed but.... I justdon't know where else to look....

I'll take any ideas or comments, please!

http://www.snailbook.com/faq/background-jobs.auto.html

--

Subject: Re: ssh not ending (sometimes) from inside a script, why?

Thank you for the link. I did read through this before posting myquestion but I just read through it again and wonder if the answer tomy question is not simply it is going to hang sometimes and there isnothing I can do about it.

This is of course not the answer I was hoping for but if it is so Iguess I should move on and just write code to kill the process after aperiod of time. I just don't think that this answer is very pretty :(

Again any thoughts or comments are welcome, and thanks.Kym

End of messages

posted by Brahma at 3:27 PM 0 comments

SSH Frequently Asked Questions

SSH Frequently Asked QuestionsSometimes my SSH connection hangs when exiting — the shell (or remotecommand) exits, but the connection remains open, doing nothing.Quick FixYou're probably using the OpenSSH server, and started a backgroundprocess on the server which you intended to continue after logging outof the SSH session. Fix: redirect the background processstdin/stdout/stderr streams (e.g. to files, or /dev/null if you don'tcare about them). For example, this hangs:

client% ssh serverserver% xterm &server% logouthangs...

Page 89: Solaris Real Stuff

but this behaves as expected:

client% ssh serverserver% xterm < /dev/null >& /dev/null &server% logoutSSH session terminatesclient%

Short ExplanationThis problem is usually due to a feature of the OpenSSH server. Whenwriting an SSH server, you have to answer the question, "When shouldthe server close the SSH connection?" The obvious answer might seem tobe: close it when the server-side user program started by clientrequest (shell or remote command) exits. However, it's actually a bitmore complicated; this simple strategy allows a race condition whichcan cause data loss (see the explanation below). To avoid thisproblem, sshd instead waits until it encounters end-of-file (eof) onthe pipes connecting to the stdout and stderr of the user program.

This strategy, however, can have unexpected consequences. In Unix, anopen file does not return eof until all references to it have beenclosed. When you start a background process from the shell on theserver, it inherits references to the shell's standard streams. Unlessyou prevent this by redirecting these, or the process closes themitself (daemons will generally do this), the existence of the newprocess will cause sshd to wait indefinitely, since it will never seeeof on the pipe connecting it to the (now defunct) shell process —because that pipe also connects it to your background process.

This design choice has changed over time. Early versions of OpenSSHbehaved as described here. For some time, it was changed to exitimmediately upon exit of the user program; then, it was changed backwhen the possibility of data loss was discovered.Race Condition DetailsAs an example, let's take the simple case of:

ssh server cat foo.txt

This should result in the entire contents of the file foo.txt comingback to the client — but in fact, it may not. Consider the followingsequence of events:

* The SSH connection is set up; sshd starts the target account'sshell as shell -c "cat foo.txt" in a child process, reading theshell's stdout and sending the data over the SSH connection. sshd iswaiting for the shell to exit.

Page 90: Solaris Real Stuff

* The shell, in turn, starts cat foo.txt in a child process, andwaits for it to exit. The file data from foo.txt which cat write toits stdout, however, does not pass through the shell process on itsway to sshd. cat inherits its stdout file descriptor (fd) from itparent process, the shell — that fd is a direct reference to the pipeconnecting the shell's stdout to sshd.* cat writes the last chunk of data from foo.txt, and exits; thedata is passed to the kernel via the write system call, and is waitingin the pipe buffer to be read by sshd. The shell, which was waiting onthe cat process, exits, and then sshd in turn exits, closing the SSHconnection. However, there is a race condition here: through thevagaries of process scheduling, it is possible that sshd will receiveand act on the SIGCHLD notifying it of the shell's exit, before itreads the last chunk of data from the pipe. If so, then it misses thatdata.

This sequence of events can, for example, cause file truncation when using scp.

posted by Brahma at 3:22 PM 0 comments

Friday, October 07, 2005

Qlogic Driver Solaris

All,

How do I get my Solaris 8 server to "see" what is attached on theswitch?

I've read the Procedure Manual (350 pages) for the Brocade 3250 as wellas the QLogic Manuals but so far I have not found any good documentationthat deals with the integration between the HBA and the Brocade switch.

I have 1 LTO2 tape drive on a 3250 Brocade switch, I have a lot moreactually, but I want to start out simple, with 1 attached device.

Also, in my cfgadm -al it shows the device that is attached to theswitch as failing...

c3 fc-private connected configuredunknownc3::500104f00052cf22 tape connected configuredunknownc4 fc-private connected configuredunknownc4::500104f00052cf25 tape connected configured

Page 91: Solaris Real Stuff

unknownc5 fc-fabric connected configuredunknownc5::500104f00052cf1c unavailable connected configuredfailingc5::500104f00052cf1f unavailable connected configuredfailingc6 fc connected unconfiguredunknown

Note: c5 is the card/port(s) that is/are connected to the switch.

Diagram:

Sun V440:----------------| |HBA| |----------------||///| (Brocade)-------------------| P0 P1 P2 P3 |-------------------||-> HPUltrium-LTO2 2GBps Drive.

Here is the switch information:

s3250:admin> nsshow{Type Pid COS PortName NodeNameTTL(sec)N 010000; 3;21:00:00:e0:8b:14:12:05;20:00:00:e0:8b:14:12:05; naFC4s: FCPFabric Port Name: 20:00:00:05:1e:35:78:77NL 010155; 3;50:01:04:f0:00:52:cf:04;50:01:04:f0:00:52:cf:03; naFC4s: FCP [HP Ultrium 2-SCSI K470]Fabric Port Name: 20:01:00:05:1e:35:78:77The Local Name Server has 2 entries }s3250:admin>

Page 92: Solaris Real Stuff

I have setup a zone for this tape drive:

s3250:admin> zoneshowDefined configuration:cfg: myconfig001zone1zone: zone1 3,21; 3,50zone: zone2 50:01:04:f0:00:52:cf:04zone: zone3 50:01:04:f0:00:52:cf:03

Effective configuration:cfg: myconfig001zone: zone1 3,213,50

Here is my current qlc.conf:

# cat qlc.conf | grep -v ^# | grep . | sorthba0-adapter-hard-loop-ID=0;hba0-enable-adapter-hard-loop-ID=0;

# pkginfo -i | grep qlcsystem SUNWqlc Qlogic ISP 2200/2202 Fibre Channel DeviceDriversystem SUNWqlcx Qlogic ISP 2200/2202 Fibre Channel DeviceDriver (64 bit)

# prtdiag -v | grep qlc0 pci 66 PCI2 SUNW,qlc-pci1077,2312 (scsi-+

0 pci 66 PCI2 SUNW,qlc-pci1077,2312 (scsi-+

0 pci 33 PCI3 SUNW,qlc-pci1077,2312 (scsi-+

0 pci 33 PCI3 SUNW,qlc-pci1077,2312 (scsi-+

Yes, I know I have two dual-port cards; I am only using the first one atthe moment.

I've noticed there are actual QLOGIC HBA drivers, I have installed themin the past but either I did not configure them properly, or something--as I did not notice any effect with the drivers installed or notinstalled.

The drivers being:

Page 93: Solaris Real Stuff

Solaris.QLogic.HBA.Fiber.Drivers$ lsTroubleshooting\ For\ The\ QLogic\ 2342\ HBAs.pdfqla2300-readme.txtqla2300.sparc_pkg.Zsansurfer2.0.30b23-1_solaris_install.binscli.1.06.16-15.SPARC-X86.Solaris.pkg.Z

Anyone have any clues?

Thanks,

Justin.

posted by Brahma at 11:24 AM 1 comments

large file limitation Veritas

Subject: RE: large file limitation

Thanks to all for your replies, obviously many options but I went withthe option below to use fsadm to reconfigure the FS to use largefiles onthe fly, Thanks to VXFS, and also thanks to James Scott for a similarreply. Worked like a charm.

Paul

Subject: Re: large file limitation

VxFS. Good :-)

2GB not an issue, can enable on the fly without unmounting etc./usr/lib/fs/vxfs/fsadm -o largefiles /MOUNTPOINT

( ex: /usr/lib/fs/vxfs/fsadm -o largefiles /data1)

verify:

/usr/lib/fs/vxfs/fsadm /data1

There you are, that is why you have Veritas :-)

Cheers

E

Page 94: Solaris Real Stuff

[email protected] wrote:> Guru's>> I have 6gig of data that I need to tar up and move to my local driveand> would like to move it all in one tar file. However, my tar bombs out> after the 2g limitation. I'm running Solaris 8 with vxfs but didn't> make the file system using the largefiles option.>> Is there a way to tar up a directory structure larger than 2g at this> point. I thought this was only a restriction in 2.6 and prior.>> TIA>> Paul Sagneri

posted by Brahma at 11:24 AM 0 comments

Permissions messed in /devices - easy fix?

Subject: [SUMMARY] Permissions messed in /devices - easy fix?

-----Original Post:Due to an errant command during a security-tightening process, I havea Solaris 8 box that has had all of the Other Write bits turned offon the special files in /devices.

Is there a reasonably easy method to get Solaris to regenerate/devices from scratch?

I've tried "touch /reconfigure and reboot" - no dice - doesn't seemto regenerate something that's needed that's already there.

Suggestion: how about nuking /devices and /dev from a script as thelast thing the OS does on the way down then boot -r?

-----

IMHO the best responses came from Casper Dik and David Foster to usepkgchk -f SUNWcsdto force the reinstallation (and presumably permissions) of thedevices.

Unfortunately, this advice came too late. What I did wastouch /reconfigure

Page 95: Solaris Real Stuff

rm -rf /dev /devicesfrom single user modeinit 0at which point the system locked.

From this point on the system would lock sometime shortly afterreading /etc/system. I made many efforts to restore /devices to noavail after that point. Attempts included:*) disabled SVM - I could, after all, only really change one side ofthe mirror when booting from CD.*) boot from CD, copy /devices and /dev from the running image to theboot drive.*) boot from CD, use the suggested devfsadm -r /tmp/a method torebuild the hierarchy.*) Last message I was getting looked like the system had loaded theRDAC driver for my SAN disks, so I disabled it.

Basically got the same lock every boot attempt.

Interestingly enough, on all these boot attempts where the 880locked, it would not even respond to BREAK on the ttya line (what I'musing for console). In all attempts I had to power cycle the box.

The box is now back in production with a fresh new Solaris 9 load. Itis something we probably should have started at Tuesday noon insteadof Wednesday afternoon, but hindsight is always 20/20.

Thanks for all replies.

-----This would generally be my experience, from Grant:I would be careful about the /devices directory. I had to restore asystem from backup, using NetBackup and the system wouldn't boot.After I rebuilt the system from a full backup, the system wouldn'tboot. I even tried rebooting with -r and even touch /reconfigure.It turned out the /devices tree wasn't restored (NetBackup doesn'tback up the /devices directory). The system would hang right afterreading the system file. I finally had to boot from cdrom, rundevfsadm on the mounted /a filesystem, and then rebooted. Then itbooted. Hope this helps.-----From Tim:Possibly a red herring, but wouldn't asudo find /dev -name '*' -exec chmod o+w \;do the trick?Response: Yes, but Yes. Not everything should have write permission.

Page 96: Solaris Real Stuff

-----Remove everything *except* for your boot device, rename/etc/path_to_inst then 'reboot -- -ra'. It'll ask you if you want torebuild the path_to_instResponse: might have worked, but I didn't need path_to_inst rebuiltas far as I knew. As stated earlier, keeping the path to the bootdevice would have been useful.

posted by Brahma at 11:23 AM 0 comments

Live Upgrade Help

Subject: UPDATE: Live Upgrade HelpHi Gurus,

I haven't received any response to this. I need help to resolve this aswe are planning to upgrade all our Production Servers from Solaris 2.6to Solaris 8 by next week. Any Help will be greatly appreciated.

I tried it on a different box with same result.One thing which i noticed is SUNWluu and SUNlurI installed both from my Jumpstart Box Solaris 8 5/03 however liveupgrade version is:"Live Upgrade 2.0 10/01"I have only one version of Solaris 8 5/03 on my Jumpstart Box.ALso i am not clear about profile which i am usingMy profile file is upgrade and its content isinstall_type upgrade

Here is theoutput of my Live Upgrade:

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

# lucreate -c "Solaris26" -m /:/dev/dsk/c0t1d0s0:ufs -m/usr:/dev/dsk/c0t1d0s5:ufs -m /var:/dev/dsk/c0t1d0s3:ufs -m/opt:/dev/dsk/c0t1d0s4:ufs -m /home:/dev/dsk/c0t1d0s6:ufs -m-:/dev/dsk/c0t1d0s1:swap -m /citadon:/dev/dsk/c0t1d0s7 -n "Solaris_8"Please wait while your system configuration is determined.No name for Current BE.Current BE is named <Solaris26>.Creating initial configuration for primary BE <Solaris26>.PBE configuration successful: PBE name <Solaris26> PBE Boot Device

Page 97: Solaris Real Stuff

</dev/dsk/c0t0d0s0>.Determining what file systems should be in the new BE.

Searching /dev for possible BE filesystem devices

Please wait while the configuration files are updated.Please wait. Configuration validation in progress...

********************************************************************************Beginning process of creating Boot Environment <Solaris_8>.No more user interaction is required until this process is complete.********************************************************************************

Setting BE <Solaris_8> state to Not Complete.Creating file systems on BE <Solaris_8>.Creating <ufs> file system on </dev/dsk/c0t1d0s0>./dev/rdsk/c0t1d0s0: 1027216 sectors in 218 cylinders of 19 tracks, 248sectors501.6MB in 14 cyl groups (16 c/g, 36.81MB/g, 17664 i/g)super-block backups (for fsck -F ufs -o b=#) at:32, 75680, 151328, 226976, 302624, 378272, 453920, 529568, 605216,680864,756512, 832160, 907808, 983456,Creating <ufs> file system on </dev/dsk/c0t1d0s6>./dev/rdsk/c0t1d0s6: 2101552 sectors in 446 cylinders of 19 tracks, 248sectors1026.1MB in 23 cyl groups (20 c/g, 46.02MB/g, 11200 i/g)super-block backups (for fsck -F ufs -o b=#) at:32, 94528, 189024, 283520, 378016, 472512, 567008, 661504, 756000,850496,944992, 1039488, 1133984, 1228480, 1322976, 1417472, 1511968, 1606464,1700960, 1795456, 1889952, 1984448, 2078944,Creating <ufs> file system on </dev/dsk/c0t1d0s4>./dev/rdsk/c0t1d0s4: 12326592 sectors in 2616 cylinders of 19 tracks, 248sectors6018.8MB in 119 cyl groups (22 c/g, 50.62MB/g, 6208 i/g)super-block backups (for fsck -F ufs -o b=#) at:32, 103952, 207872, 311792, 415712, 519632, 623552, 727472, 831392,935312,1039232, 1143152, 1247072, 1350992, 1454912, 1558832, 1662752, 1766672,1870592, 1974512, 2078432, 2182352, 2286272, 2390192, 2494112, 2598032,2701952, 2805872, 2909792, 3013712, 3117632, 3221552, 3317280, 3421200,3525120, 3629040, 3732960, 3836880, 3940800, 4044720, 4148640, 4252560,4356480, 4460400, 4564320, 4668240, 4772160, 4876080, 4980000, 5083920,

Page 98: Solaris Real Stuff

5187840, 5291760, 5395680, 5499600, 5603520, 5707440, 5811360, 5915280,6019200, 6123120, 6227040, 6330960, 6434880, 6538800, 6634528, 6738448,6842368, 6946288, 7050208, 7154128, 7258048, 7361968, 7465888, 7569808,7673728, 7777648, 7881568, 7985488, 8089408, 8193328, 8297248, 8401168,8505088, 8609008, 8712928, 8816848, 8920768, 9024688, 9128608, 9232528,9336448, 9440368, 9544288, 9648208, 9752128, 9856048, 9951776, 10055696,10159616, 10263536, 10367456, 10471376, 10575296, 10679216, 10783136,10887056, 10990976, 11094896, 11198816, 11302736, 11406656, 11510576,11614496, 11718416, 11822336, 11926256, 12030176, 12134096, 12238016,Creating <ufs> file system on </dev/dsk/c0t1d0s5>./dev/rdsk/c0t1d0s5: 4198392 sectors in 891 cylinders of 19 tracks, 248sectors2050.0MB in 41 cyl groups (22 c/g, 50.62MB/g, 8256 i/g)super-block backups (for fsck -F ufs -o b=#) at:32, 103952, 207872, 311792, 415712, 519632, 623552, 727472, 831392,935312,1039232, 1143152, 1247072, 1350992, 1454912, 1558832, 1662752, 1766672,1870592, 1974512, 2078432, 2182352, 2286272, 2390192, 2494112, 2598032,2701952, 2805872, 2909792, 3013712, 3117632, 3221552, 3317280, 3421200,3525120, 3629040, 3732960, 3836880, 3940800, 4044720, 4148640,Creating <ufs> file system on </dev/dsk/c0t1d0s3>./dev/rdsk/c0t1d0s3: 1027216 sectors in 218 cylinders of 19 tracks, 248sectors501.6MB in 14 cyl groups (16 c/g, 36.81MB/g, 17664 i/g)super-block backups (for fsck -F ufs -o b=#) at:32, 75680, 151328, 226976, 302624, 378272, 453920, 529568, 605216,680864,756512, 832160, 907808, 983456,Mounting file systems for BE <Solaris_8>.Calculating required sizes of file systems for BE <Solaris_8>.Populating file systems on BE <Solaris_8>.Copying file system contents to BE <Solaris_8>.INFORMATION: Setting asynchronous flag on ABE <Solaris_8> mount point</.alt.2703/var> file system type <ufs>.INFORMATION: Setting asynchronous flag on ABE <Solaris_8> mount point</.alt.2703/usr> file system type <ufs>.INFORMATION: Setting asynchronous flag on ABE <Solaris_8> mount point</.alt.2703/opt> file system type <ufs>.INFORMATION: Setting asynchronous flag on ABE <Solaris_8> mount point</.alt.2703/home> file system type <ufs>.INFORMATION: Setting asynchronous flag on ABE <Solaris_8> mount point</.alt.2703/> file system type <ufs>.Copying of file system / directory </var> is in progress...Copying of file system / directory </var> completed successfully.Copying of file system / directory </usr> is in progress...Copying of file system / directory </usr> completed successfully.

Page 99: Solaris Real Stuff

Copying of file system / directory </opt> is in progress...Copying of file system / directory </opt> completed successfully.Copying of file system / directory </home> is in progress...Copying of file system / directory </home> completed successfully.Copying of file system / directory </> is in progress...Copying of file system / directory </> completed successfully.Creating compare database for file system </var>.Creating compare database for file system </usr>.Creating compare database for file system </opt>.Creating compare database for file system </home>.Creating compare database for file system </>.Updating compare database on other BEs.Updating compare database on BE <Solaris_8>.Compare databases updated on all BEs.Making Boot Environment <Solaris_8> bootable.Making the ABE bootable.Updating ABE's /etc/vfstab file.The update of the vfstab file on the ABE succeeded.Updating ABE's /etc/mnttab file.The update of the mnttab file on the ABE succeeded.Updating ABE's /etc/dumpadm.conf file.The update of the dumpadm.conf file on the ABE succeeded.Updating partition ID tag on boot environment <Solaris_8> device</dev/rdsk/c0t1d0s2> to be root slice.Updating boot loader for <SUNW,UltraSPARC-IIi-cEngine> on bootenvironment <Solaris_8> device </dev/dsk/c0t1d0s0> to match OS release.Making the ABE <Solaris_8> bootable succeeded.Setting BE <Solaris_8> state to Complete.Creation of Boot Environment <Solaris_8> successful.Creation of Boot Environment <Solaris_8> successful.# luupgrade -u -n "Solaris_8" -s /opt/install/Solaris_28_503 -j/opt/install/config/liveupgrade/upgradeValidating the contents of the media </opt/install/Solaris_28_503>.The media is a standard Solaris media.The media contains an operating system upgrade image.The media contains <Solaris> version <8>.The media contains patches for the product.Locating upgrade profile template to use.Locating the operating system upgrade program.Checking for existence of previously scheduled Live Upgrade requests.Creating upgrade profile for BE <Solaris_8>.Updating ABE's /etc/vfstab file.The update of the vfstab file on the ABE succeeded.Determining packages to install or upgrade for BE <Solaris_8>.Performing the operating system upgrade of the BE <Solaris_8>.CAUTION: Interrupting this process may leave the boot environment

Page 100: Solaris Real Stuff

unstableor unbootable.Segmentation FaultThe operating system patch installation completed.WARNING: BE file system </dev/dsk/c0t1d0s6> not mounted.The Solaris upgrade of the BE <Solaris_8> failed.

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Hi Gurus,

I figured out the problem. It was typo in the lucreate command near /opt(s was missing) .After rectifying that I was able to create BE but whilein upgrade process it failed with error "segmentation fault".

luupgrade -u -n "Solaris_8" -s /install/Solaris_28_503/ -j/install/config/liveupgrade/upgrade

Snip.....

Creating upgrade profile for BE <Solaris_8>.Updating ABE's /etc/vfstab file.The update of the vfstab file on the ABE succeeded.Determining packages to install or upgrade for BE <Solaris_8>.Performing the operating system upgrade of the BE <Solaris_8>.CAUTION: Interrupting this process may leave the boot environmentunstable or unbootable.

Segmentation FaultThe operating system patch installation completed.The Solaris upgrade of the BE <Solaris_8> failed.

Any help/pointer will be greatly appreciated. I will summarize.

Regards

B

Hi Gurus,

This is my first Live Upgrade so please bear with me if my questionssound silly.

Page 101: Solaris Real Stuff

I am trying to do live upgrade from Solaris 2.6 to Solaris 8 and raninto following error:

# lucreate -c "Solaris_26" -m /:/dev/dsk/c0t1d0s0:ufs -m/usr:/dev/dsk/c0t1d0s3:

ufs -m /var:/dev/dsk/c0t1d0s4:ufs -m /opt:/dev/dsk/c0t1d06:ufs -m-:/dev/dsk/c0t1d0s1:swap -m /home:/dev/dsk/c0t1d0s5 -n "Solaris_8"

Please wait while your system configuration is determined.No name for Current BE.Current BE is named <Solaris_26>.Creating initial configuration for primary BE <Solaris_26>.PBE configuration successful: PBE name <Solaris_26> PBE Boot Device</dev/dsk/c0t0d0s0>.

Determining what file systems should be in the new BE.

Searching /dev for possible BE filesystem devices

luconfig: ERROR: Template filesystem definition failed for /opt..

ERROR: Configuration of BE failed.

I id the flowing steps to create a boot environment:

1) Install Latest patch Cluster on Target Box2) Install SUNWluu and SUNWlur from Solaris 8 CD to target box.3) Created partition on Target box on second Disk4) Mount Jumstart /opt/install to target box as /install5) Ran lucreate to create a boot environment on target box second diskas below:

# lucreate -c "Solaris_26" -m /:/dev/dsk/c0t1d0s0:ufs -m/usr:/dev/dsk/c0t1d0s3:

ufs -m /var:/dev/dsk/c0t1d0s4:ufs -m /opt:/dev/dsk/c0t1d06:ufs -m-:/dev/dsk/c0t1d0s1:swap -m /home:/dev/dsk/c0t1d0s5 -n "Solaris_8"

1) Do I need to create a file system with newfs? Or just partition will

do?

2) On Jumpstart Box what steps I need to follow besides creating a

profile file which contains - "install_type upgrade" ?

Page 102: Solaris Real Stuff

3) How do I select cluster type? Cluster SUNWCall?

4) How do I select the Patch Cluster for Solaris 8 on jumpstart Box?

Any help will be greatly appreciated

TIA

B

posted by Brahma at 11:22 AM 0 comments

How do I tell what caused my machine to crash?

Subject: 6.1) How do I tell what caused my machine to crash?

The crash messages will usually be displayed on the console, and areusually logged to /var/adm/messages via syslog as well after a warmreboot. In older versions of Solaris, the "dmesg" command may alsoshow crash messages. If your system repeatedly crashes with similarlooking errors, try searching through the patch list on the Sunpatch database for a description that matches your machine.

In versions of Solaris 2 up to and including Solaris 2.6, uncommentthe "savecore" line in the file /etc/init.d/sysetup to enable crashdumps. As of Solaris 7 and later, crash dumps are enabled bydefault; see the manual page for dumpadm(1M) for information on howto customize system dump configuration.

To report a crash dump, you need a symbolic traceback for it to beuseful to the person looking at it. Type the following:cd /var/crash/`hostname`echo '$c' | adb -k unix.0 vmcore.0

The "crash" utility can be useful for analyzing crash dumps forSolaris up to and including Solaris 8. "Crash" has been supersededby "mdb" (modular debugger) as of Solaris 8.

posted by Brahma at 11:18 AM 0 comments

What can I do if my machine slows to a crawl or just hangs?

Subject: 6.2) What can I do if my machine slows to a crawl or just hangs?

Try running "ps" to look for large numbers of the duplicate programs orprocesses with a huge size field. Some system daemons occasionally can

Page 103: Solaris Real Stuff

get into a state where they fork repeatedly and eventually swamp thesystem. Killing off the child processes doesn't do any good, so you haveto find the "master" process. It will usually have the lowest pid.

Another useful approach is to run vmstat to pin down what resource(s)your machine is running out of. You can tell vmstat to give ongoingreports by specifying a report interval as its first argument.

The programs "top" and "sps" are good for finding processes that areloading your system. "Top" will give you the processes that are consumingthe most cpu time. "Sps" is a better version of "ps" that runs much fasterand displays processes in an intuitive manner. Top is available atftp://ftp.groupsys.com/pub/top/. Sps is available atftp://ftp.csv.warwick.ac.uk/pub/solaris2/sps-sol2.tar.gz.

Doug Hughes <Doug dot Hughes at Eng dot Auburn dot EDU> has written asmall, quick PS workalike called "qps", available from his web page athttp://www.eng.auburn.edu/users/doug/second.html

Sometimes you run out of memory and you won't be able to run enoughcommands to even find out what is wrong. You will get messages of the type"out of memory" or "no more processes". Note that "out of memory" refers tovirtual memory, not physical memory. On a Solaris system, virtual memoryis generally equal to the sum of the swap space and the amount of physicalmemory (less a roughly constant amount for the kernel) on the machine. Thecommand "swap -s" will tell you how much virtual memory is available.

You can sync the disks to minimize filesystem corruption if you have tocrash the system:

Use the L1-A sequence to crash the system. If you are on an older system,type "g0" and you will get the message "panic: ... syncing file systems".When you see the word "done", hit L1-A again and reboot. On systemswith the "new" prom, type "n" to get into the new command mode and type"sync".

posted by Brahma at 10:09 AM 0 comments

How do I find out how much physical memory a machine has?

Subject: 6.3) How do I find out how much physical memory a machine has?

Use /usr/sbin/prtconf if the machine is running Solaris. If it's asun4u running Solaris 8 or previous, /usr/platform/sun4u/sbin/prtdiagis very helpful. It's /usr/sbin/prtdiag in Solaris 9 and later.

Page 104: Solaris Real Stuff

On high-end machines, /usr/sbin/cfgadm -al can also provide memoryinformation.

The banner message on reboot (or type "banner" in the monitor on machineswith Openboot proms) will usually report the amount of physical memory.

Alternatively, you can open up the case and count SIMMS and/or memoryboards.

A perl script "memconf" is also available that identifies the sizes andlocations of SIMM/DIMM memory modules installed in a Sun system. Italso works on several SPARC clones and with Sun Explorer data. It ismaintained by Tom Schmidt <tschmidt at micron dot com>. Download memconf fromhttp://www.4schmidts.com/unix.html

posted by Brahma at 10:07 AM 0 comments

How do I find out what my machine's memory is being used for? How can I tell if I need more memory?

Subject: 6.4) How do I find out what my machine's memory is being used for?How can I tell if I need more memory?

To discover how much virtual memory (i.e. swap) is free, run "swap -s" or"vmstat". If you're using tmpfs for /tmp, "df /tmp" will also work.

Discovering how physical memory is being used can be more difficult,however. Memory pages that are not being used by processes are used as asort of extended cache, storing pages of memory-mapped files for possiblelater use. The kernel keeps only a small set of pages free for short-termuse, and frees up more on demand. Hence the free memory reported by vmstatis not an accurate reflection, for example, of the amount of memoryavailable for user processes.

An easy way to determine whether or not your machine needs more memoryis to run vmstat and examine the po (page out) column and the sr (scanrate) column. If these columns consistently show large numbers, thissuggests that your machine does not have enough memory to support itscurrent workload, and frequently needs to write pages belonging toactive processes to disk in order to free up enough memory to run thecurrent job.

posted by Brahma at 10:06 AM 0 comments

How can my program tell what model Sun it is running on?

Page 105: Solaris Real Stuff

How can my program tell what model Sun it is running on?Subject: 9.2)

On older suns, the model type is encoded in the hostid. Forsuns with the "Openboot" prom (All sparcstations and the 600 series),/usr/sbin/prtconf will reveal the model type.

"Suntype", written by John DiMarco ([email protected]) is a shellscript which does the appropriate thing on all suns. It is availablefor anonymous ftp at ftp://ftp.cs.toronto.edu/pub/jdd/suntype

Alternatively, grab Michael Cooper's <mcooper at magnicomp dot com> "sysinfo"program, which provides all sorts of information about a given system,including the machine type. sysinfo is available on the web athttp://www.magnicomp.com/, although it is now a commercial product that isfree only for educational and non-profit organizations.

posted by Brahma at 10:05 AM 0 comments

My remote ufsdump is failing with a "Protocol botched" message.

Subject: 10.1) My remote ufsdump is failing with a "Protocol botched"message. What do I do?

The problem produces output like the following:

...DUMP: Dumping /dev/rsd0a (/) to /dev/nrst8 on host fooDUMP: mapping (Pass I) [regular files]DUMP: mapping (Pass II) [directories]DUMP: estimated 8232 blocks (4.02MB) on 0.00 tape(s).DUMP: Protocol to remote tape server botched (in rmtgets).rdump: Lost connection to remote host.DUMP: Bad return code from dump: 1

This occurs when something in .cshrc on the remote machine prints somethingto stdout or stderr (eg. stty, echo). The remote ufsdump command doesn'texpect this, and chokes. Other commands which use the rsh protocol (eg.rdist, rtar) may also be affected.

The way to get around this is to add the following line near thebeginning of .cshrc, before any command that might send somethingto stdout or stderr:

if ( ! $?prompt ) exit

Page 106: Solaris Real Stuff

This causes .cshrc to exit when prompt isn't set, which distinguishesbetween remote commands (eg. rdump, rsh) where these variables are notset, and interactive sessions (eg. rlogin) where they are.

posted by Brahma at 10:04 AM 0 comments

How do I restore to a different location the contents of a tarfile created with absolute pathnames

Subject: 10.4) How do I restore to a different location the contents of atarfile created with absolute pathnames?

Tarfiles should not normally be created with absolute pathnames, onlywith relative pathnames. Do not type "tar c /path/name" to create a tararchive, type "(cd /path; tar c name)" instead.

Note: if you do "(cd /path/name; tar c .)", you will indeed avoid absolutepathnames, but beware that the tarfile created may silently overwrite thepermissions of the current directory when unpacked. That's OK if youunpack it via:"mkdir name; cd name; tar xf /my/tarfile.tarThat's not OK if you unpack it via:"cd /tmp; tar xf /my/tarfile.tar" -- you will change the permissionsof /tmp.

If you do have an archive created with absolute pathnames, you can unpackit in a different location by using GNU's version of tar, which will stripoff the leading /.

Alternatively, you can use pax to strip off the leading /, as follows:

pax -r -s '/^\///' <abspath.tar

Or you can use chroot and a statically linked version of tar, as follows:

cp /usr/sbin/static/tar /tmp/restore# cd /tmp/restore# cat abspath.tar | chroot /tmp/restore /tar xf -

posted by Brahma at 10:03 AM 0 comments

Why do both my net interfaces have the same ethernet address?

Subject: 12.1) Why do both my net interfaces have the same ethernet address?

The Ethernet version 2.0 specification (November 1982) states:

Page 107: Solaris Real Stuff

The physical address of each station is set by networkmanagement to a unique value associated with the station,and distinct from the address of any other station on anyEthernet. The setting of the station's physical addressby network management allows multiple multiple data linkcontrollers connected to a single station to respond tothe same physical address.

This doesn't normally constitute a problem because each interface willtypically be on a different subnet. If, for some reason, differentethernet addresses are required on different interfaces (for example, toattach two interfaces to the same subnet), a new one may be assignedusing the ifconfig command. Alternatively, for all modern Sun hardware,you can set the "local-mac-address?" eeprom variable to "true", which willcause each NIC to use a unique MAC address. This is needed for manyfailover and trunking configurations.

posted by Brahma at 9:58 AM 0 comments

How do I set my hme interface to e.g. 100Mb full duplex

Subject: 12.3) How do I set my hme interface to e.g. 100Mb full duplex?

This applies only to Solaris 2.5 or later; hme interfaces are not supportedunder SunOS 4.x or earlier versions of Solaris.

Sun's 10/100 network interface on the Ultra systems and on the SunSWIFTnetwork cards are capable of negotiating with a network switch; if thisis working, and if the other end is capable of 100Mb full duplex (FD)operation, the hme card will automatically set itself properly. However,this may not necessarily work with some networking gear.

If the two ends have different ideas about what mode the link is, youmay see "late collision" messages, dropped packets, or complete failure.

To force a particular mode, e.g. 100Mb FD, you can use ndd as follows:

# turn off autonegotiationndd -set /dev/hme adv_autoneg_cap 0# turn on 100Mb full-duplex capabilityndd -set /dev/hme adv_100fdx_cap 1# turn off 100Mb half-duplex capabilityndd -set /dev/hme adv_100hdx_cap 0# turn off 10Mb full-duplex capabilityndd -set /dev/hme adv_10fdx_cap 0

Page 108: Solaris Real Stuff

# turn off 10Mb half-duplex capabilityndd -set /dev/hme adv_10hdx_cap 0

You may have to force the other end (e.g. switch) to use the same mode.Consult the manual for your switch. NB: Fast ethernet hubs are always100Mb half-duplex, and ethernet hubs are always 10Mb half-duplex.

If you have more than one hme card in your system, before issuing theabove ndd commands, you need to first select the specific hme card youwant to set. For example, to select hme2, type:ndd -set /dev/hme instance 2Subsequent ndd commands to /dev/hme will only apply to hme2.

If you want to force all the hme cards on your system to a specificmode at machine boot, you can set hme driver variables in /etc/system.For example, to force all hme cards on the system to use 100Mbit FD,put the following in /etc/system:

set hme:hme_adv_autoneg_cap=0set hme:hme_adv_100fdx_cap=1set hme:hme_adv_100hdx_cap=0set hme:hme_adv_10hdx_cap=0set hme:hme_adv_10fdx_cap=0

posted by Brahma at 9:38 AM 0 comments

How do I find out what process is using a particular port?

Subject: 12.4) How do I find out what process is using a particular port?

Ports are held open in the same way as files are, by file handles withinthe process. In most states, a port will also have a handle into anotherprocess on the other side of that connection. If you need to find outwhich process is holding open a particular port, run lsof(ftp://ftp.cerias.purdue.edu/pub/tools/unix/sysutils/lsof) and grep for theport number.

Thanks to Stuart Whitby <swhitby at legato dot com>

posted by Brahma at 9:37 AM 0 comments

I've forgotten the root password; how can I recover?

Subject: 15.1) I've forgotten the root password; how can I recover?

You need to have access to the machine's console.

Page 109: Solaris Real Stuff

1. Note the root partition (e.g. /dev/sd0a or /dev/dsk/c0t3d0s0)2. Hit STOP-A or L1-A (or, on an ASCII terminal or emulator, send a<BREAK>) to halt the operating system, if it's running.3. Boot single-user from CD-ROM (boot cdrom -s) or networkinstall/jumpstart server (boot net -s) (NB: if it asks you for a prompassword, see below.)4. Mount the root partition (e.g. /dev/dsk/c0t3d0s0) on "/a". "/a" isan empty mount point that exists at this stage of the installationprocedure. (mount /dev/dsk/c0t3d0s0 /a)5. Set your terminal type so you can use a full-screen editor, e.g. vi.(you can skip this step if you know how to use "ex" or "vi" from openmode). If you're on a sun console, type "TERM=sun; export TERM"; ifyou're using an ascii terminal (or terminal emulator on a PC) for yourconsole, set TERM to the terminal type (e.g. TERM=vt100; export TERM).6. Edit the passwd file (/a/etc/passwd for SunOS 4.x, /a/etc/passwd.adjunctfor SunOS 4.x with shadow passwords/C2 security), /a/etc/shadow forSolaris 2.x and remove the encrypted password entry for root7. cd to /; Type "umount /a"8. reboot as normal in single-user mode ("boot -s"). The root account willnot have a password. Give it a new one using the passwd command.

Thanks to Stefan Voss <s dot voss at terradata dot de>

PROM passwords:

Naturally, you may not want anyone with physical access to the machine tobe able to do the above to erase the root password. Suns have a securitypassword mechanism in the PROM which can be set (this is turned off bydefault). The man page for the eeprom command describes this feature.

If security-mode is set to "command", the machine only be booted withoutthe prom password from the default device (i.e. booting from CD-ROM orinstall server will require the prom password). Changing the root passwordin this case requires moving the default device (e.g. the boot disk) to adifferent SCSI target (or equivalent), and replacing it with a similarlybootable device for which the root password is known. If security-mode isset to full, the machine cannot be booted without the prom password, evenfrom the default device; defeating this requires replacing the NVRAM on themotherboard. "Full" security has its drawbacks -- if, during normaloperations, the machine is power-cycled (e.g. by a power outage) or halted(e.g. by STOP-A), it cannot reboot without the intervention of someonewho knows the prom password.

posted by Brahma at 9:37 AM 0 comments

modifying scripts with binary inside

Page 110: Solaris Real Stuff

Subject: SUMMARY: modifying scripts with binary inside

Hi Managers,

Sorry for the long delay. I have just completed fixing my problem.

I want to thank you all for your responses.At the end of this msg I have attached the 3-4 suggestions I had from ppl.I have not tried editing the binary but I wanted you all to see the othersuggestions.

My goal was to edit the text portion of the script w/o corrupting the binary.

First of all this file did in fact have text and binary w/i one file (script).The script removes the binary w/ the "tail" cmd and then unzips, "tar xf" andinstalls the binary.The binary contains patches and packages, so the script was huge 240MB +. Thatposed another problem, the file beingtoo large for vi.

I attacked this problem using Brad's method. Thank you Brad.

Objective: Remove the binary, edit the text portion, and still be able to runmy script.

At first I wanted to reattach my edited text back to the binary but instead Ijust made the edited script (which has no binary now) access the originalscript and work with the binary from that file. This worked wonderfully.

Solution:

##Take the binary out$ mv scriptname scriptname.yuck$ strings scriptname.yuck > scriptname$ vi scriptname##Above works but the binary was turned into crazy chars and stayed inside thefile, so still too large for vi so here are the cmds to deal w/ that##quote from BradUse split(1). I'd recommend

split -b 1m pre-req.install.orig.nobin chunk

This will produce about 156 one-megabyte files, named chunk.aa, chunk.ab, etc.

Then use:

Page 111: Solaris Real Stuff

for FILE in chunk.*dofile $FILEdone > /tmp/chunkstyle

Now scan /tmp/chunkstyle for the first file that file(1) reports to benon-text. If, for example, that one is chunk.je, edit chunk.jd to remove thebinary characters at the end (if any), then:

cat chunk.[a-i]* chunk.j[a-d] > script

Afterwards, script should contain your script.

A better way is to find out which chunk contains the last line of the script.It's probably "exit 0", test this with:

grep "exit 0" pre-req.install.orig.nobin

If you only get one result, that's the string to use to find the end of thefile in the middle of some chunk:

grep "exit 0" chunk.*

Then edit that chunk to get rid of binary characters, and cat using wildcardsas above.

If you don't get the wildcards part, here's the scoop: chunk.[a-i]* matcheschunk.aa, chunk.ab, ..., chunk.iz. You could also use chunk.[a-i][a-z]. Thesecond expression (chunk.j[a-d]) matches chunk.ja, chunk.jb, chunk.jc, andchunk.jd. This assumes that the end of the script and the beginning of thebinary data is in the file chunk.jd.

Be sure to read the man page for split(1), and of course, take great care notto clobber your original, pre-split files.## END quote from BradI just edited the chunk that contained the text script portion which was thefirst one. All the other junks had crazy chars (very much readable) thatrepresented the binary. Like I said up top my edited script (no bin) wasmodified to just access and create the tar file from the orig. text/binaryscript.Suggestions from others:(You can use plain old Bill Joy vi just fine. vim, on the other hand,will corrupt the binary portion.)(You probably need a hex editor. Try this one<http://www.hhdsoftware.com/hexeditor.html> )(If you can replace text with the same number of characters you may get

Page 112: Solaris Real Stuff

away with using emacs. If you're not familair with emacs, play with it alittle on some othe files first. It's very powerful, but it has a bit of alearning curve. Also, and this should go without saying, work on a copy ofthe script, not the original, and test the modified version before puttingit up for general consumption.Allan)(Perl...but vi should work, too. )

> Subject: modifying scripts with binary inside>> I am just curious as to what is used to modify text scripts that also havebinaries inside of them.> I want to modify the text only portion. I was told you can't use vi.>> Thank you for your time.>> Al

posted by Brahma at 9:09 AM 0 comments

V440 hardware disk mirroring

Subject: Summary: V440 hardware disk mirroring

Many thanks to every one who replied,the builtin LSI controller does not show any progress indicatorwhen syncing disks, ie '% complete'; there is no extra arg to the'raidctl' cmd that will cause a progress indicator to be displayed.

only two of the four internal disks can be mirrored with thebuiltin hardware RAID controller, the other 2 internal diskscan be mirrored with SVM.

George

> I have bunch of newly bought V440 servers and am keen on using the> Hardware Mirroring feature.>> One thing I have noticed that there does not seem to be any indication> of a progress when rebuilding/syncing the disks.>> Q:> o does anyone know where I can get more information on this> hardware feature ?>> o is there a way to see the progress when it is syncing disks ?

Page 113: Solaris Real Stuff

>> o after a server is built and mirroring is active, can one disk> be pulled out and used in another server as a base for that new> system ? I've been doing this with DiskSuite for some time now but> an not sure how this will work under hardware mirroring.>> o does anyone have any good or bad things that in regards to> hardware mirroring that might help me ?>

posted by Brahma at 9:08 AM 0 comments

file belongs to which package ?

How do I know which file (such as /usr/lib/libgtk-x11-2.0.so ) belongsto which package?

% pkgchk -l -p <filename>

--

posted by Brahma at 8:01 AM 0 comments

Summary: Sun patches cluster broke sendmail

Subject: Summary: Sun patches cluster broke sendmail

Thanks to William Cole, Rob Windsor and Christ Clark. My problem was thata patch broke my sendmail.

All told me the reason was because there was a sendmail package originalfrom Sun remaining in the system, while we have been using a locallycompiled sendmail (SUNWsndmu and SUNWsndmr from patch 110615-13) Hence themixup.Version 8.11.7p1+Sun

I just had to# pkginfo|grep -i sendmail8_Recommended# find . -name 'SUNWsndm*'

Here are some lessons.

"Go look for the patches from the last batch that touched sendmail in/var/sadm/pkg/sndmr/save and /var/sadm/pkg/sndmu/save."

Page 114: Solaris Real Stuff

"The overall lesson is that you should always work with the packagesystem when changing any file on Solaris that is part of an existing Sunpackage. The patch tools all assume that the package database is correct,so if you are going to modify or replace any file that already has anentry in /var/sadm/install/contents you should start by removing whateverpackage that file is part of and replacing every piece of that package,preferably with your own packaged collection. "

Thanks all.

======= original post ==============

- Solaris 8, SunFire280R -

I applied the latest recommended patch cluster and suddenly I'm seeingproblems with my sendmail. Could anyone with similar experience help mepinpoint the problem patch and tell me if I can remove it? If this is nota proper forum to ask, can I get some pointers on where to look?

Problems I see are:

- We used to be on V6, but now sendmail requires V4 for the queue fileversion. (unsupported qf file version error)

- rRFC line causes invalid queue format error (bad line error), and it's"almost" the same as RPFD line, so I remove them by scripts, but I'm notcomfortable with that.

My problem is similar to what's described here.

http://www.issociate.de/board/post/213139/Force_sendmail_8.13_to_write_V2-style_qf_files_...._is_it_possible?.html

There's a slight complication, since we are running 3 daemons, 1st and 3rdin the main host and 2nd in a linux box which only handles amavisprocessing.

sendmail -d0.1 -bv root shows:

1) main hostVersion 8.11.7p1+Sun

2) amavis hostVersion 8.13.1

Page 115: Solaris Real Stuff

The "+Sun" part is really giving me a headache. That patch seems to be aretro...

I have some customized configuration so it's not easy to upgrade sendmailversion. I had to ask for permission to apply patches for a very long timebefore getting this chance. And now I find myself guilty. Which makes mefeel I can't justify the time spent on fixing them, so I'm hoping to findthe problem and take that specific patch out...

posted by Brahma at 7:59 AM 0 comments

Share cdrom Jumpstart E10k

After talking with a Sun Support Engineer I finally decided to share the cdromand run the add_install_client off the cdrom

it works

so

1) # share -F nfs -o ro,anon=0 /cdrom/cdrom0/s02) # cd /cdrom/cdrom0/s0/Solaris_2.6/Tools3) # ./add_install_client newdomain sun4u1

and output is<#20> ok<#20> ok limit-ecache-size<#20> ok boot qfe0 -svTimeout waiting for ARP/RARP packetTimeout waiting for ARP/RARP packet2ee00 XRequesting Internet address for 0:0:be:a6:a3:a9Internet address is 192.168.42.50 = C0A82A32hostname: banner-awhoami: no domain nameroot server: sspp-qfe2root directory: /cdrom/sol_2_6_598_sparc_smcc_svr/s0/Solaris_2.6/Tools/BootSunOS Release 5.6 Version Generic_105181-05 [UNIX(R) System V Release 4.0]Copyright (c) 1983-1997, Sun Microsystems, Inc.whoami: no domain nameConfiguring devices...SUNW,qfe7: Link Down - cable problem?SUNW,qfe7: Link Down - cable problem?SUNW,qfe7: Link Down - cable problem?SUNW,qfe6: Link Down - cable problem?SUNW,qfe6: Link Down - cable problem?

Page 116: Solaris Real Stuff

SUNW,qfe6: Link Down - cable problem?SUNW,qfe5: Link Down - cable problem?SUNW,qfe5: Link Down - cable problem?SUNW,qfe5: Link Down - cable problem?SUNW,qfe4: Link Down - cable problem?SUNW,qfe4: Link Down - cable problem?SUNW,qfe4: Link Down - cable problem?SUNW,qfe3: Link Down - cable problem?SUNW,qfe3: Link Down - cable problem?SUNW,qfe3: Link Down - cable problem?SUNW,qfe1: Link Down - cable problem?SUNW,qfe1: Link Down - cable problem?SUNW,qfe1: Link Down - cable problem?

etc etc etc> -----Original Message-----> From: Saxon, Stuart> Sent: 05 September 2005 18:02> To: '[email protected]'> Subject: 10k jumpstart solaris 2.6 problems>>>> Can some kind soul please tell me what am I doing wrong here ..........thanks>>> 10k netcon output> ---------------------------> SUNW,Ultra-Enterprise-10000, using Network Console> OpenBoot 3.2.136, 10240 MB memory installed, Serial #10920873.> Ethernet address 0:0:be:a6:a3:a9, Host ID: 80a6a3a9.>>>> <#20> ok> <#20> ok limit-ecache-size> <#20> ok boot qfe0 -sv> Boot device: /sbus@55,0/SUNW,qfe@0,8c20000 File and args: -sv> Timeout waiting for ARP/RARP packet> Timeout waiting for ARP/RARP packet> Timeout waiting for ARP/RARP packet> Timeout waiting for ARP/RARP packet> 2ee00 X> Requesting Internet address for 0:0:be:a6:a3:a9> Internet address is 192.168.42.50 = C0A82A32

Page 117: Solaris Real Stuff

> hostname: banner-a> whoami: no domain name> root server: sspp-qfe2> root directory: /export/install/Solaris_2.6/Tools/Boot> RPC: Timed out.> lookup: RPC error.> boot: cannot open kernel/unix> Enter filename [kernel/unix]:>>>> shares on the ssp> ---------------------------> root@sspp # share> - /export/install ro,anon=0 "">>> /etc/bootparams> ----------------------->> root@sspp # cat /etc/bootparams> banner-a root=sspp-qfe2:/export/install/Solaris_2.6/Tools/Bootinstall=sspp-qfe2:/export/install/Solaris_2.6 boottype=:inrootopts=:rsize=32768>

>

posted by Brahma at 7:59 AM 0 comments

Remove Disk Label

Dave Uhring wrote:

>> And execute `suninstall` after the disks are labelled.

>Well what I now did is create my own disk type with the harddisk>specifications from www.seagate.com and labelled the disk but still only>8 GB is seen. I am now clueless and don't know what to do to get the 20>GB available.>Any other ideas ?

Did you try:

dd if=/dev/zero of=/dev/rdsk/cXdYtZs2 count=16

Page 118: Solaris Real Stuff

and then restart so that it looks unlabeled.

Casper

posted by Brahma at 7:57 AM 0 comments

DE/dtterm: any *mouseless* way to "blacken" (whole screen?)

I noticed that 2-click selects word, 3-click selects lineso I tried 4-click to see what happens :-)

Reply

Subject: Re: CDE/dtterm: any *mouseless* way to "blacken" (wholescreen?)? Via key-stroke?

> >> Any key-direct way to "choose" the whole screen?

> >I don't think dtterm (or xterm) has a Select All (mouseless) feature.> >The shortest way I can think of is to quad-click on the screen> >to select all. It's not mouseless but beats dragging the mouse.

> A "quad-click"? 4 clicks, fast?

> What (short doc) did you read to learn that?

Actually, the quadruple click was defined in the Nextstep HumanInterface Guidelines - the reasoning was:

Single click - set insertion pointDouble click - select wordTriple click - select lineQuadruple click - select all text

A somewhat logical progression. Mind you, I'm going off my memory here(it's been a while since I've messed about with OpenSTEP).

--Ken

posted by Brahma at 7:56 AM 0 comments

Check Process

Hi,I have another request.

Page 119: Solaris Real Stuff

Simulate this problem, and then when ps is hung, take truss & pstackoutput of theprocess using the options for showing complete contents of bufferswhile reading & writing.That should help us to find out why ps is hanging.Thanks,~Amol

. srv216bdnfs# truss -eaf -o /tmp/truss_ps-eaf.txt -p 2569

srv216bdnfs# pstack 2569

rv216bdnfs# truss -eaf -o /tmp/truss_ps-eaf_other.txt ps -eaf

posted by Brahma at 7:56 AM 0 comments

How Solaris[TM] Operating System calculates available swap.

*Keyword(s):*swap, swapfs, swapfs_minfree, tmpfs

*Description:*

When configuring swap space on a system, it is useful to know howSolaris[TM] Operating System calculates available swap.

*Document Body:*

Swap size is equal to all available physical memory (RAM swap) plus any physicaldisk partitions dedicated for swap, plus space allocated to swap in the form ofswapfiles.

The amount of memory available for RAM swap is less than the size of physicalmemory. This is because only pageable memory is available for RAM swap.Memory that is not pageable includes most of the kernel, any memory that ismlock()ed and intimate shared memory (ISM).

You can see the summary of memory by running "netstat -k system_pages" or"kstat -n system_pages".

Note that the -k option to the netstat command is undocumented and its use isdeprecated. In Solaris 10 the -k option is unavailable and "kstat -nsystem_pages" must be used. The output format of "kstat -n" differs from that of"netstat -k" however all the values of interest are present in "kstat -n" outputand have the same names as they do in the output of "netstat -k" with theexception of "pages total" which is renamed "pagestotal".

Page 120: Solaris Real Stuff

The discussion below uses "netstat -k"; similar values for "pp_kernel","pageslocked" and "pagestotal" would be displayed by "kstat -n system_pages".

% netstat -k system_pagessystem_pages:

physmem 15153 nalloc 5931096 nfree 5868115 nalloc_calls 3092 nfree_calls2386kernelbase 268463432 econtig 280756224 freemem 1719 availrmem 11932lotsfree 236 desfree 118 minfree 59 fastscan 7568 slowscan 100 nscan 0desscan 25pp_kernel 2773 pagesfree 1719 pageslocked 3204 pages total 15136^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^

The kernel on this 128MB machine is 2773 pages, or 21MB, and thetotal size of locked memory (including the kernel) is 3204 pages, 25MB(Note that on Solaris 10 and above, pp_kernel excludes pageslocked,and thus they must be added). A page is 8KB (8192 bytes).

One would expect this machine to have a virtual swap size of: 128MB - 25MB =103MB.Note however that "pages total" is 15136 pages which corresponds to 118MB andnot the 128MB of physical memory installed.

During boot, some of the system's memory is used for statically-allocatedportions of the kernel, such as the kernel text, TSBs (translation storagebuffers), page structures, the page table, etc. Memory left after thoseallocations is shown in the physmem above. In this case, there is 118MB (15153* pagesize, or 8KB) memory for which page structures are created (Note thaton Solaris 10 and above, this memory is already accounted for in pp_kernel).

So, we have 118MB - 25MB = 93MB of pageable virtual memory, but there is onelast factor. The swap file system leaves a reserve of 1/8th of memory, or 118/8= 14.6MB. The minimum for this is 2MB.

Summary:

Total Memory 128MB- static kernel 10MB- Kernel 25MB- swapfs_minfree 15MB--------------------- -----Total Ram Swap 78MB

Total Swap space = Ram swap + disk swap

Page 121: Solaris Real Stuff

How to read df -k /tmp output:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Filesystem kbytes used avail capacity Mounted onswap 52280 664 51616 2% /tmp

tmpfs (when mounted on /tmp) dynamically changes its size depending upon howmuch memory (or swap) is available.

The "kbytes" column in "df -k /tmp" output is the amount of swap spaceavailable, rather than the total.

The tmpfs file system also has a minfree, so the total is slightly less than theamount of swap available. "kbytes" column of "df -k /tmp" output actuallycorrespond to "swap -s" output of swap available. Normally, these two numbersare pretty close.

The difference is due to the tmpfs_minfree value, which is 2MB by default.

% swap -stotal: 120272k bytes allocated + 26696k reserved = 146968k used, 53696kavailable

If some process would de-allocate some swap, and swap -s would show more swapavailable, then the df -k /tmp would also show that its total file system sizehas increased.

posted by Brahma at 7:55 AM 0 comments

Fibre Channel

Fibre Channel

Like Ethernet or ATM, Fibre Channel is a networking standard that isdesigned to move data through specific devices at specific speeds.Fibre Channel is used primarily for server backbones and as a way ofattaching a server to a storage device, such as a RAID array or a tapebackup device. In fact, Fibre Channel is the architecture of choicefor many storage area networks.

Many IT pros find that Fibre Channel is an answer to their storageprayers. Since a company's data grows daily, each night the system isbacking up a little bit more data than the night before. Thus, thewindow for completing the backup tends to shrink a little bit eachyear. The only way to back up more data in less time is to get afaster storage device and a faster medium for transmitting the data

Page 122: Solaris Real Stuff

from the server to the storage device. In production networks, FibreChannel products have been able to accomplish a sustained transferrate of 97 MB per second when backing up large files. Companies thatuse Fibre Channel on database servers have reported these servers canhandle tens of thousands of I/Os per second due to Fibre Channeltechnology.

posted by Brahma at 7:54 AM 0 comments

Resetting NVRAM on a Sun system

Document Audience: SPECTRUMDocument ID: 7047Title: Resetting NVRAM on a Sun systemUpdate Date: Tue Sep 13 00:00:00 MDT 2005Products: Solaris, Sun Blade 2000 Workstation, SunBlade 1000 Workstation, Sun Fire 280R Server, SunFire V880 Server, Sun Fire V480 Server, Sun FireV890 Server, Sun Fire V490 ServerTechnical Areas: NVRAM (Non-Volatile Read-OnlyMemory)

--------------------------------------------------------------------------------

--------------------------------------------------------------------------------

Keyword(s):resetting, nvram, netra 20, sun fire, sunblade, sun, systems

Description:

This document explains how to reset the NVRAM back toits default settings on aSun system.

Document Body:

NOTE: Before executing the following, it is importantand highly recommended tonote (print out) the current NVRAM settings beforechanging these options backto their default settings.

There are two ways of resetting the NVRAM:

1) At the OK prompt:

Page 123: Solaris Real Stuff

ok set-defaults resetsmost parametersok set-default 'parameter' resetsthat one parameterfor example: ok set-default auto-boot?

2) At bootThis is helpful when an improperly created devicealias for a monitor hasbeen created and no display is going to the screen.

The most common method is to issue L1+n from thekeyboard (WYSE) while poweringon the system. Hold down this key combination untilyou see video with amessage stating that the NVRAM parameters have beenset to their default values.

On some Sun keyboards, the "L1" key is replaced with a"STOP" key. Therefore,issue the key sequence STOP+n simultaneously from thekeyboard while powering onthe system. Hold down this key combination until yousee video with a messagestating that the NVRAM parameters have been set totheir default values.

Newer systems (Sun Blade[TM] workstations and SunFire[TM] hardware) are usingUSB Keyboards and Mice. Unlike the older 8 pinmini-din keyboards, these systemsdo not have a "STOP+N" key sequence to reset theOBP/NVRAM parameters. Instead,a "safe NVRAM" boot mode is available. Remember thatunlike a STOP+N whichrestores ALL factory NVRAM parameters, the "safeNVRAM" boot only alters a fewparameters and only for that one boot cycle. Thedescribed work around allowsbooting any USB keyboard type system, including RSCequipped servers, to the OKprompt and communicating with these systems via theserial ttya console.

Note: For security reasons this procedure does notreset the settings for

Page 124: Solaris Real Stuff

security-mode and security-password. The informationfrom this Infodoc cannot beused to recover from a lost OBP password. Pleasecontact Sun Service to get areplacement NVRAM in such circumstances.

==================== Procedure ====================

1. Press the power button to power up the system.

2. Once the maintenance LED starts to flash rapidly,immediately press the Powerbutton twice (similar to double-clicking a mouse, butleave a short gap ofaround 1 second between presses, to have the actionreliably registered.) Theactual time when you do the double press of the powerbutton is the point inPOST when the maintenance LED (wrench light) rapidlyflashes.

NOTE:

If you double press too late or too soon, the systemwill power off. Do not getconfused with the earlier occurrences of themaintenance and OK to remove LEDsflashing together. This is a part of the system testto make sure the LEDs arefunctional.

If you're running POST even in MIN level, it will beseveral minutes dependingupon your configuration, before the maintenance LEDflashes rapidly.

A screen similar to the following is displayed toindicate that you havesuccessfully reset the OpenBoot NVRAM configurationvariables to their defaultvalues:

Sun Fire xxx (8 X UltraSPARC-III), Keyboard Present

OpenBoot x.x, 256 MB memory installed, Serial#xxxxxxxx.

Page 125: Solaris Real Stuff

Ethernet address xx:xx:xx:xx:xx:xx, Host ID:xxxxxxxx.

Safe NVRAM mode, the following nvram configurationvariables have beenoverridden:

diag-switch? is trueuse-nvramrc? is falseinput-device , output-device are defaultedttya-mode , ttyb-mode are defaulted

These changes are temporary and the original valueswill be restored after thenext hardware or software reset. Once you are at theOK prompt, it is at thispoint you make your changes to the OBP parameters, oruse set-defaults to resetall parameters to factory default settings.

On the Netra[TM] 20 server, the defaults cannot be setby hitting the powerbutton twice. The power button does not function inthe same way as it does onthe Sun Fire or Sun Blade. The LOM command bootmodewill need to be used asshown below:

lom> poweroff

lom> bootmode reset_nvram

lom> poweron

This will reset the nvram to default values.

Note:If the RSC card is your console device, resetting theNVRAM will take theconsole away from the RSC and put it back to thedefault (example, input-device

posted by Brahma at 7:54 AM 0 comments

useradd/passwd script

Page 126: Solaris Real Stuff

RE: useradd/passwd script

I found this script that might work for you. I haven't tried itmyself. I hope it helps.

Here is the link for this script:

http://groups.google.com/group/comp.os.linux.misc/msg/b2867efd801e77ab?dmode=source&hl=en

#!/bin/sh

## SCRIPT TO ADD MULTIPLE USERS TO A LINUX SYSTEM#### The script will add users, generate secure password and mail## info to the users. Also a log file is made!#### You need to make it work:#### mailx - traditional command-line-mode mail user agent## pwgen - password generator## http://sourceforge.net/projects/pwgen/#### user_list format: USERNAME NAMES LASTNAME CLASS EMAIL#### (c) 2005 Manuel de la Torre##

# Modify this variables if you need

MINDAYS=0 # Change password at anytimeMAXDAYS=45 # Max days password is validWARNDAYS=10 # Warning message before expire passwdEXPDAYS=180 # Days to expire account from nowINACTIVE=45 # Days to lock after passwd expires

# Calculte days from EpochYEARS_FROM_EPOCH="$((($(date +%G) - 1970 ) * 365 ))"DAYS_THIS_YEAR="$((($(date +%j))))"DAYS_FROM_EPOCH=$(( $YEARS_FROM_EPOCH + $DAYS_THIS_YEAR + 8 ))

# Define some colors first:red='\e[0;31m'RED='\e[1;31m'blue='\e[0;34m'BLUE='\e[1;34m'cyan='\e[0;36m'

Page 127: Solaris Real Stuff

CYAN='\e[1;36m'NC='\e[0m' # No Color

# Ensure that root is running the scriptWHOAMI=`/usr/bin/whoami`if [ $WHOAMI != "root" ]; thenecho "Sorry. You must be root to add new users"exit 1fi

# Ensure proper format of the command

thiscmd=`basename $0`

if [ "$#" -ne 1 ]; thenecho "USAGE: $thiscmd user_file" && exit 1fi

USR_FILE=$1

# Remove blank lines from input file# Used this solution because of problems# with the IFS in a if [ -n ] statement#

# Check if buffer file exist, then remove

if [ -a /tmp/buffer ]thenrm /tmp/buffer

fi

# Read input file, and delete blank lines

cat $USR_FILE | while read TEMP

doif [ -n "$TEMP" ]; thenecho "$TEMP" >> /tmp/bufferfidone

# Copy temporal file to input file

Page 128: Solaris Real Stuff

cp /tmp/buffer $USR_FILErm /tmp/buffer

## Save the current value of the IFSifs="$IFS"

# Define the separator (TAB) between fields# if your input has tabs between fields#IFS=`echo t | tr t '\t'`

# Define the separator (COMMA) between fields# if your input has spaces between fieldsIFS=","

# assumning the file has one line per user, in a layout like:## USERNAME NAMES LASTNAME CLASS EMAIL#

# Configure the useradd program globaly:# useradd -D -b $DEF_HOME -e $EXPIRE -g $GROUP

cat $USR_FILE | while read USERNAME NAMES LASTNAME CLASS EMAIL

do

USERNAME=`echo $USERNAME | tr A-Z a-z` #lower caseFULLNAME="$NAMES $LASTNAME"COMMENT="$FULLNAME,$CLASS"

# Check if users exists in system

NOEXISTE=`cut -d: -f1 /etc/passwd | grep -i $USERNAME`

if [ -n "$NOEXISTE" ]; thenecho -e "Creating user $USERNAME: \t ${RED}FAILED${NC}"else# Some output to keep you happyecho -e "Creating user $USERNAME: \t ${CYAN}SUCCESS${NC}"

# Add the user

useradd $USERNAME -c "$COMMENT" -m

# Set the initial password

Page 129: Solaris Real Stuff

PASSWORD=`pwgen -s`echo $USERNAME:$PASSWORD | chpasswd

# Change expitation of passwords

chage -m $MINDAYS -M $MAXDAYS -E $(( $EXPDAYS + $DAYS_FROM_EPOCH )) -I $INACTIVE -d 0 $USERNAME

# Mail password

echo -e "login: $USERNAME \npassw: $PASSWORD" | mail -s "Account Info" -b <email@removed> $EMAIL

# Log the resultsecho "$USERNAME:$FULLNAME:$PASSWORD:$CLASS:`date`" >>users_created_log

fidone

posted by Brahma at 7:52 AM 0 comments

Password Aging

Password Aging, Part 1ITworld.com, Unix in the Enterprise 10/4/05

Sandra Henry-Stocker, ITworld.com

While it's clearly possible to use the /etc/passwd and /etc/shadowfiles in Solaris and other Unix systems without making use of thepassword aging features, you could be taking advantage of thesefeatures to encourage your users to practice better security -- and,with the right password aging values, you can configure a goodpassword-changing policy into your system files while limiting therisk that your users will be locked out of their accounts.

In this week's column, we look at the various fields in the shadowfile that govern password aging and suggest settings that might giveyou the right balance between user convenience and good passwordsecurity.

The /etc/shadow File

Page 130: Solaris Real Stuff

To begin our review of how password aging works on a Solaris system,let's examine the format of the /etc/shadow file. Each colon-separatedrecord looks like this:

johndoe:PaSsWoRdxye7d:13062:30:120:10:inactive:expire:^ ^ ^ ^ ^ ^ ^ ^ ^| | | | | | | | |username:password:lastchg:min:max:warn:inactive:expire:flag

The first field is clearly the username. The next is the passwordencryption. The third is the date when the password was last changedexpressed as the number of days since January 1, 1970. The min fieldis the number of days that a password MUST be kept after it ischanged; this is used to keep users from changing their passwords andthen immediately changing them back to their previous values (therebyinvalidating the intended security). The max field represents themaximum number of days that any password can be used before it isexpired. If you want your users to strictly change their passwordsevery 30 days, for example, you could set both of these fields to 30.Generally, however, the max field is set to a considerably largervalue than min. The warn field specifies the number of days prior to apassword expiration that a user is warned on login that his/herpassword is about to expire. This should not be too short a period oftime since many users don't log in every day and the display of thismessage in the login messages is easy to overlook.

The inactive field sets the number of days that an account is allowedto be inactive. This value can help prevent idle accounts from beingbroken into. The expire field represents the absolute day (expressedas the number of days since January 1, 1970) that the password willexpire. You might use this field if you want all of your users'passwords to expire at the end of the fiscal year or at the end of thesemester. The last field, flag, is unused until Solaris 10 at whichpoint it records the number of failed login attempts.

If the lines in your shadow file look like this:

sbob:dZlJpUNyyusab:12345::::::

The username and password are set and the date on which the passwordwas last changed has been recorded, but no password aging is takingeffect.

If it looks like this, the account is locked.

dumbo:*LK*:::::::

Page 131: Solaris Real Stuff

Various other combinations of the shadow file are possible, but themin, max and warn fields will only make sense if the lastchg field isset. For example:

jdoe:w0qjde84kr%p0:13062:60:::::

User must keep a password for 60 days once he changes it, but nopassword changes are required.

jdoe:w0qjde84kr%p0:13062::60::::

User must change his password every 60 days, but can change it at anytime (including immediately changing it back to its previous value).

Choosing Min and Max Settings

If you want to turn on password aging, the combination of minimum(must keep) and maximum (invalid after) values enforces a practicalpassword update scheme. Suggested settings depend in part on thesecurity stance of your particular network. However, general consensusseems to be that passwords, once changed, should be kept for a month(min=30) and that passwords should be changed every three to sixmonths (from max=90 to max=180).

Once a user has used a password for 30 days, he's probably not goingto reset it back to its previous value. By then he should know it wellenough to continue using it.

Changing a password more often than every month or so would probablymake it hard for users to remember their passwords without writingthem down.

The down side of min values is that this setting doesn't allow someoneto change his password if he believes it has been compromised when thecompromise happens within the "min" period. Whatever system you adoptshould, therefore, make it painless for a user to request that hispassword be reset whenever he believes it may no longer be secure.

Wrap Up

We hear a lot about the tradeoff between security and convenience asit permeates so many of our decisions about how we manage our networksbut, when it comes to passwords, we must be careful not to cross theline between securing logins and preventing them altogether. Lockingour users too easily out of their accounts can reduce security aseasily as enhance it. Using password aging with the proper settings

Page 132: Solaris Real Stuff

can limit the risk that security constraints turn into unintendeddenials of service.

Next week, we'll look at how to introduce password aging on a systemwhere users have never had their passwords expire.

posted by Brahma at 7:52 AM 0 comments

Wednesday, October 05, 2005

DNS client Troubleshooting

DNS trouble..

nslookup always uses dns.ping uses /etc/nsswitch.conf to determine which method of address lookup.

Recommend that you# cp /etc/nsswitch.dns /etc/nsswitch.conf

does ping with an IP number work? If it does then there is something wrongwith nsswitch.conf or possibly nscd or even /etc/hosts.

Try viewing a webpage by IP address for example:http://209.249.116.195 (Sun Site)If this don't work then it's a routing problem; if it does, then youneed to check your dns files: resolv.conf et all.

Also worth (as suggested by someone else) posting the output of:

dig www.sun.com @204.186.23.65and maybe,dig www.sun.com @192.18.128.11

dns2 is outside my main network 204.184.23.65dig @dns2 google.com worksdig google.com worksdig @gate google.com does not work, as expected, since it is not adns server

I think that since I can go to websites by ip fine, my defaultroutemust be correct, same with netmasks( it has to be 255.255.0.0 since we use both 192.168.10 and 192.168.20on this section of the network.

Page 133: Solaris Real Stuff

double check /etc/nsswitch.conf is called exactly that and has appropriateperms, I don't know what they _should_ be but 644 shouldn't hurt. And itcontains at least

hosts: files dns

double check /etc/resolv.conf is called exactly that and containsappropriate entries.

then pick an external address that you *know* won't be in /etc/hostsOR have been cached by nscd. But one that you know has at least oneIP address

then try

getent hosts your.chosen.host

getent is an interface to your system resolvers and is dictated by/etc/nsswitch.conf and if appropriate, /etc/resolv.conf

If this displays nothing "echo $?" will show the return of the getent- a failed lookup is "2". A successful one is "0".

A successful one though, should show you an address and stuff,

If this fails try rebuilding /etc/nsswitch.conf and /etc/resolv.confto ensure there is no parsing errors or the like...

'nslookup' uses the DNS resolver libraries directly. It thus bypassesthe name service switch (configured via /etc/nsswitch.conf). Ifyou're seeing applications fail to resolve addresses, but nslookupworks fine, then the next thing to try is this:

% getent hosts "host name here"

If that works, then it's an application problem of some sort. Theapplication is just misconfigured (perhaps it has a SOCKS proxyconfigured that it doesn't need, or needs one and doesn't have it).

If that doesn't work, then the problem is almost certainly in/etc/nsswitch.conf.

Check the firewall, whether port 53 tcp and udp are passed. My guess isthat only one of them may pass.

posted by Brahma at 1:14 PM 0 comments

Page 134: Solaris Real Stuff

Sun Solaris 10 on 280R & Brocade 3250 Attachment and Sun Solaris 10 on 280R & Brocade 3250 Attachment and Configuration

Subject: Sun Solaris 10 on 280R & Brocade 3250 Attachment andConfiguration

Server: Sun 280RCard: *SUN* 2342 Dual Port HBA (uses QLC driver)Switch: Brocade Silkworm 3250.

I need some assistance getting a Sun 280R to see tape drives attached toa Brocade 3250 switch.

[switch] -> box (port 0)-> tape1 (port 1)-> tape2 (port 2)-> tape3 (port 3)-> tape4 (port 4)

Here are my devices according to a Brocade Silkworm 3820.I am not sure I am adding the zones correctly.

Basically, I want one zone per drive:

DRIVE1<->ZONE1DRIVE2<->ZONE2DRIVE3<->ZONE3DRIVE4<->ZONE4

However, my first step is just to get the Solaris 10 box to see all fourtape drives.

Here are the devices:

s3250:admin> nsshow{Type Pid COS PortName NodeNameTTL(sec)N 010000; 3;21:00:00:e0:8b:14:12:05;20:00:00:e0:8b:14:12:05; naFC4s: FCIP FCPFabric Port Name: 20:00:00:05:1e:35:78:77NL 010155; 3;50:01:04:f0:00:52:cf:1f;50:01:04:f0:00:52:cf:1e; naFC4s: FCP [HP Ultrium 2-SCSI K4B0]Fabric Port Name: 20:01:00:05:1e:35:78:77NL 010255; 3;50:01:04:f0:00:52:cf:22;50:01:04:f0:00:52:cf:21; naFC4s: FCP [HP Ultrium 2-SCSI K4B0]

Page 135: Solaris Real Stuff

Fabric Port Name: 20:02:00:05:1e:35:78:77NL 010355; 3;50:01:04:f0:00:52:cf:1c;50:01:04:f0:00:52:cf:1b; naFC4s: FCP [HP Ultrium 2-SCSI K4B0]Fabric Port Name: 20:03:00:05:1e:35:78:77NL 010455; 3;50:01:04:f0:00:52:cf:25;50:01:04:f0:00:52:cf:24; naFC4s: FCP [HP Ultrium 2-SCSI K4B0]Fabric Port Name: 20:04:00:05:1e:35:78:77The Local Name Server has 5 entries }s3250:admin> zonecreate "greenzone", "3,21; 3,50"s3250:admin> zonecreate "redzone", "21:00:00:e0:8b:14:12:05; 4,3"s3250:admin> cfgsave

--

On the Solaris side of the fence:

I believe c3 and c4 is the Sun QLogic (QLC) 2342 adapter.

bash-3.00# cfgadm -c configure c4bash-3.00#

bash-3.00# cfgadm -alvAp_Id Receptacle Occupant ConditionInformationWhen Type Busy Phys_Idc0 connected configured unknownbash-3.00# cfgadm -c configure c4 ; cfgadm -alvAp_Id Receptacle Occupant ConditionInformationWhen Type Busy Phys_Idc0 connected configured unknownunavailable scsi-bus n /devices/pci@8,700000/scsi@6:scsic0::dsk/c0t6d0 connected configured unknownTOSHIBA DVD-ROM SD-M1401unavailable CD-ROM n/devices/pci@8,700000/scsi@6:scsi::dsk/c0t6d0c1 connected configured unknownunavailable fc-private n/devices/pci@8,600000/SUNW,qlc@4/fp@0,0:fcc1::2100000c500187f5 connected configured unknownSEAGATE ST373307FSUN72Gunavailable disk y/devices/pci@8,600000/SUNW,qlc@4/fp@0,0:fc::2100000c500187f5c1::2100000c50018daa connected configured unknownSEAGATE ST373307FSUN72Gunavailable disk n

Page 136: Solaris Real Stuff

/devices/pci@8,600000/SUNW,qlc@4/fp@0,0:fc::2100000c50018daac2 connected unconfigured unknownunavailable scsi-bus n /devices/pci@8,700000/scsi@6,1:scsic3 connected unconfigured unknownunavailable fc-fabric n/devices/pci@8,600000/SUNW,qlc@1/fp@0,0:fcc4 connected unconfigured unknownunavailable fc n/devices/pci@8,600000/SUNW,qlc@1,1/fp@0,0:fcusb0/1 empty unconfigured okunavailable unknown n /devices/pci@8,700000/usb@5,3:1usb0/2 empty unconfigured okunavailable unknown n /devices/pci@8,700000/usb@5,3:2usb0/3 empty unconfigured okunavailable unknown n /devices/pci@8,700000/usb@5,3:3usb0/4 empty unconfigured okunavailable unknown n /devices/pci@8,700000/usb@5,3:4bash-3.00#

Does anyone have a good doc for attaching Fiber Channel Tape Drives to abrocade switch and then getting Solaris to see the devices? :)

Thanks!

Justin.

posted by Brahma at 9:29 AM 0 comments

Disk usage monitoring

Disk usage monitoring

Takeaway:Learn to use scripting and cron to run regular scripts that willmonitor your disk space.

advertisement

Monitoring disk usage is a critical part of running any server. Thereis nothing worse than having a runaway process or malicious user fillup a filesystem only to find out about it when the boss calls toindicate she can't connect to or write to the server. With a littlebit of shell scripting and cron usage, you can have a script run everyhour or at any other interval that will indicate if you need to beconcerned about disk space filling up.

Page 137: Solaris Real Stuff

The script itself is very small:

<code>

#!/bin/sh

fs=`mount|egrep '^/dev'|grep -iv cdrom| awk '{print $3}'`

typeset -i thresh="90"

typeset -i warn="95"

for i in $fs

do

skip=0

typeset -i used=`df -k $i|tail -1|awk '{print $5}'|cut -d "%" -f 1`

if [ "$used" -ge "$warn" ]; then

echo "CRITICAL: filesystem $i is $used% full"

fi

if [ "$used" -ge "$thresh" -a "$used" -le "$warn" ]; then

echo "WARNING: filesystem $i is $used %full"

fi

done

</code>

The first thing the script does is get the list of mount points fromthe mount program and filters out any mounted CD-rom drives. It alsodoesn't take into account any special file systems like /proc or /sysby making sure the mount point device is an actual device (by makingsure the device name starts with /dev).

Next, set your threshold to something sensible, like 90% full. Adjustthis to taste; this just generates a warning. Likewise, you set yourcritical warning value to 95%. Notice that the command typeset -i is

Page 138: Solaris Real Stuff

used to ensure that bash treats these numbers as integers, which youneed for comparing the values obtained from df.

The $used variable gets set on each filesystem with the Use% columnoutput from the df tool. This is compared to your thresholds.

The script simply outputs the warnings to standard output. The scriptcan easily be adjusted to have it send an e-mail to someone should oneof the thresholds be breached. To have this executed every 30 minutesvia cron, edit the system crontab to include:

<code>

0,30 * * * * * /usr/local/bin/diskmon.sh

</code>

In this case, all the warnings will be sent via cron to whoeverreceives cron's output mails.

posted by Brahma at 9:25 AM 0 comments

FTP tutorial

There are also some higher ports above 1024 that FTP uses to displayfiles and folders. FTP just grabs those ports whenever it needs them.

Following is a short FTP tutorial:

Active FTP vs. Passive FTP, a Definitive ExplanationIntroductionOne of the most commonly seen questions when dealing with firewallsand other Internet connectivity issues is the difference betweenactive and passive FTP and how best to support either or both of them.Hopefully the following text will help to clear up some of theconfusion over how to support FTP in a firewalled environment.

The Basics

FTP is a TCP based service exclusively. There is no UDP component toFTP. FTP is an unusual service in that it utilizes two ports, a 'data'port and a 'command' port (also known as the control port).Traditionally these are port 21 for the command port and port 20 forthe data port. The confusion begins however, when we find thatdepending on the mode, the data port is not always on port 20.

Page 139: Solaris Real Stuff

Active FTP

In active mode FTP the client connects from a random unprivileged port(N > 1024) to the FTP server's command port, port 21. Then, the clientstarts listening to port N+1 and sends the FTP command PORT N+1 to theFTP server. The server will then connect back to the client'sspecified data port from its local data port, which is port 20.

From the server-side firewall's standpoint, to support active mode FTPthe following communication channels need to be opened:

FTP server's port 21 from anywhere (Client initiates connection)FTP server's port 21 to ports > 1024 (Server responds to client'scontrol port)FTP server's port 20 to ports > 1024 (Server initiates data connectionto client's data port)FTP server's port 20 from ports > 1024 (Client sends ACKs to server'sdata port)

When drawn out, the connection appears as follows:

In step 1, the client's command port contacts the server's commandport and sends the command PORT 1027. The server then sends an ACKback to the client's command port in step 2. In step 3 the serverinitiates a connection on its local data port to the data port theclient specified earlier. Finally, the client sends an ACK back asshown in step 4.

The main problem with active mode FTP actually falls on the clientside. The FTP client doesn't make the actual connection to the dataport of the server--it simply tells the server what port it islistening on and the server connects back to the specified port on theclient. From the client side firewall this appears to be an outsidesystem initiating a connection to an internal client--something thatis usually blocked.

Passive FTP

In order to resolve the issue of the server initiating the connectionto the client a different method for FTP connections was developed.This was known as passive mode, or PASV, after the command used by theclient to tell the server it is in passive mode.

In passive mode FTP the client initiates both connections to theserver, solving the problem of firewalls filtering the incoming dataport connection to the client from the server. When opening an FTP

Page 140: Solaris Real Stuff

connection, the client opens two random unprivileged ports locally (N> 1024 and N+1). The first port contacts the server on port 21, butinstead of then issuing a PORT command and allowing the server toconnect back to its data port, the client will issue the PASV command.The result of this is that the server then opens a random unprivilegedport (P > 1024) and sends the PORT P command back to the client. Theclient then initiates the connection from port N+1 to port P on theserver to transfer data.

From the server-side firewall's standpoint, to support passive modeFTP the following communication channels need to be opened:FTP server's port 21 from anywhere (Client initiates connection)FTP server's port 21 to ports > 1024 (Server responds to client'scontrol port)FTP server's ports > 1024 from anywhere (Client initiates dataconnection to random port specified by server)FTP server's ports > 1024 to remote ports > 1024 (Server sends ACKs(and data) to client's data port)When drawn, a passive mode FTP connection looks like this:

In step 1, the client contacts the server on the command port andissues the PASV command. The server then replies in step 2 with PORT2024, telling the client which port it is listening to for the dataconnection. In step 3 the client then initiates the data connectionfrom its data port to the specified server data port. Finally, theserver sends back an ACK in step 4 to the client's data port.While passive mode FTP solves many of the problems from the clientside, it opens up a whole range of problems on the server side. Thebiggest issue is the need to allow any remote connection to highnumbered ports on the server. Fortunately, many FTP daemons, includingthe popular WU-FTPD allow the administrator to specify a range ofports which the FTP server will use.

The second issue involves supporting and troubleshooting clients whichdo (or do not) support passive mode. As an example, the command lineFTP utility provided with Solaris does not support passive mode,necessitating a third-party FTP client, such as ncftp.With the massive popularity of the World Wide Web, many people preferto use their web browser as an FTP client. Most browsers only supportpassive mode when accessing ftp:// URLs. This can either be good orbad depending on what the servers and firewalls are configured tosupport.

SummaryThe following chart should help admins remember how each FTP modeworks:

Page 141: Solaris Real Stuff

Active FTP :command : client >1024 -> server 21data : client >1024 <- server 20

Passive FTP :command : client >1024 -> server 21data : client >1024 -> server >1024

posted by Brahma at 9:23 AM 0 comments

Friday, September 16, 2005

gnu date increment by 1

> Hi

> With gnu date, how can I get the list of dates continuosly between a> specific period say for example apr 2,2005 to july 31,2005 in the> format %Y/%m/%d

> command should be in the form of> ./script -d1 04/02/2005 -d2 06/31/2005

ITYM 07/31/2005

> 2005/04/02> 2005/04/03> 2005/04/04> ..> ..> ..> 2005/07/29> 2005/07/30/> 2005/07/30

> Help is very much appreciated.

> Thanks,

Something like this should work:

now=`date +"%Y/%m/%d" -d "04/02/2005"`end=`date +"%Y/%m/%d" -d "07/31/2005"`

Page 142: Solaris Real Stuff

while [ "$now" != "$end" ]donow=`date +"%Y/%m/%d" -d "$now + 1 day"`echo "$now"done

Note that date handles some invalid dates like "06/31/2005" by trying tomap them to a close real date.

Regards,

Ed.

posted by Brahma at 3:50 PM 0 comments

sets network nic parameters value

Bhushan <[email protected]> wrote:> Add following line in /etc/system file

> set bge:bge_adv_autoneg_cap=0

> This sets value to permannetly even after reboot

That's the most inflexible way of doing it:

1. It requires a reboot to activate2. It sets the value on all instances of the driver

You have more flexibility with ndd:

ndd -set /dev/bge0 adv_autoneg_cap 0

I hope Sun will make a consistent interface for all network drivers oneday. The situation today is frustating.

- some drivers have a single /dev entry, you have to set instance firstthen the parameter:ndd -set /dev/qfe instance 3ndd -set /dev/qfe adv_autoneg_cap 0- others (like bge) create a separate /dev entry for each instance:ndd -set /dev/bge3 adv_autoneg_cap 0- some drivers use completely different names- some drivers (mostly x86) don't have a ndd interface at all.

posted by Brahma at 3:49 PM 0 comments

Page 143: Solaris Real Stuff

ether address on multi-port cards

ether address on multi-port cards

Why is the ethernet address the same for every port on a multiportethernet card, and what are the implied consequences of this?Specifically the 4 bge interfaces in a V210, or the 4 ce interfaces on aquad gigaswift.

The system gets this hardware address from the EEPROM not from theethernet hardware.

> The system gets this hardware address from the EEPROM not from the> ethernet hardware.

not default on the v210, as it has:local-mac-address?=truetry # eeprom local-mac-address?=trueif you want quads to have different addresses.

v210: # ifconfig -alo0: flags=1000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4> mtu 8232 index 1inet 127.0.0.1 netmask ff000000bge0: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 2inet 137.58.166.251 netmask ffffff00 broadcast 137.58.166.255ether 0:3:ba:da:a8:8fbge1: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 3inet 137.58.167.251 netmask ffffff00 broadcast 137.58.167.255ether 0:3:ba:da:a8:90bge2: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 4inet 137.58.168.251 netmask ffffff00 broadcast 137.58.168.255ether 0:3:ba:da:a8:91bge3: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 5inet 137.58.169.251 netmask ffffff00 broadcast 137.58.169.255ether 0:3:ba:da:a8:92

/jörgen

>Why is the ethernet address the same for every port on a multiport>ethernet card, and what are the implied consequences of this?

Page 144: Solaris Real Stuff

>Specifically the 4 bge interfaces in a V210, or the 4 ce interfaces on a>quad gigaswift.

That's because the PROM programs them that way; there are no supposedconsequences as generally you don't have all cards in the samedomain.

But all of our current interfaces have on-board private mac addressesyou can use by setting:

eeprom local-mac-address?=true

and then rebooting.

The qfe cards have a "local-mac-address" property (and from thelooks of it, so do many other cards) which reveals theirprogrammed mac-address even if booted with local-mac-address?=false

prtconf -vp | grep locallocal-mac-address?: 'true'local-mac-address: 0003ba30.c0f4local-mac-address: 0003ba30.c0f5local-mac-address: 0003ba30.c0f6local-mac-address: 0003ba30.c0f7local-mac-address: 080020a9.f67e

posted by Brahma at 3:48 PM 0 comments

Solaris 8 hung on boot.

RE: Solaris 8 hung on boot.

If you want to determine what is is hanging on, you can:- boot from CD- mount the root FS- edit /etc/rc2

find the section that looks like this:

for f in /etc/rc2.d/S*; doif [ -s $f ]; thencase $f in*.sh) . $f ;;*) /sbin/sh $f start ;;esac

Page 145: Solaris Real Stuff

fidone

and change the line with 'start' on it to look like this:

*) /sbin/sh -x $f start ;;

- unmount the FS, and reboot.

On the next start up, you will see a verbose output of what is actuallyrunning. You should then be able to determine what line the startup ishanging on. That will give us/you a better idea of what's wrong. Often, it'snetwork related, like it can't resolve a name or address encountered in aconfig file somewhere.

Brian

posted by Brahma at 3:47 PM 0 comments

delete a file on logging UFS

When you delete a file on logging UFS, the file and its allocatedresources are not freed immediately, what happens is that thedirectory entry gets zapped and the inode is put onto a so calleddelete queue and the delete thread will run from time to timeto drain the queue (also on umount and lockfs -f requests).

when the delete queue gets drained, the inode and its allocatedresources are freed, this is done within a transaction in caseof logging and as part of this delete transaction the allocatedblocks are freed as well, but are not immediately availablefor re-use as long as the whole transaction is not committedto the log.

this does not mean that a log roll must happen to be ableto re-use those blocks, committing the entire transactionto the log makes them available for new allocations.

if you run out of space UFS will detect a possible ENOSPC failureand will forcibly drain the delete queue in this case tofree up any resources still lurking around in the delete queueand re-do the allocation attempt before eventuallyfailing with ENOSPC.

posted by Brahma at 3:46 PM 0 comments

Page 146: Solaris Real Stuff

cloning hard disk to new, bigger ones...

Fixed font - Proportional font

cloning hard disk to new, bigger ones...

Subject: cloning hard disk to new, bigger ones...

I'm using g4u (basically a clever script which the heart is dd) to clone oldhard disks containing Interactive Unix (sigh) to new, bigger one.

My question is : because the geometry of the disk differs, at least for thecylinder count, I feel that I should adjust some parameters to reflect thedisk change. The free size is not an issue, but I would like to do things asbest as possible...

I suspect I should modify the Partitions file, but how ? Is it possible touse sysadmin to do that ?

Thanks

Matthieu

Reply

> I'm using g4u (basically a clever script which the heart is dd) to clone old> hard disks containing Interactive Unix (sigh) to new, bigger one.

> My question is : because the geometry of the disk differs, at least for the> cylinder count, I feel that I should adjust some parameters to reflect the> disk change. The free size is not an issue, but I would like to do things as> best as possible...

> I suspect I should modify the Partitions file, but how ? Is it possible to> use sysadmin to do that ?

classic method is with tar. boot up into single-user-mode, theninitialize the new disk with your partitions. Then for each partitionyou want to move over, mount it (say in /mnt), and do something like this:

# cd /mnt# tar cf - /var | tar xfpB -

That would have moved your var onto the new partition.

Page 147: Solaris Real Stuff

After you've moved everything over, don't forget to make the new diskbootable. On Solaris, there is a command would be 'installboot',hopefuly IU had something similar.

posted by Brahma at 3:46 PM 0 comments

Terminal Characters

stty erase ^H

tells the terminal driver upon which character it should performthe "bacward erase character" operation in its built-in lineeditor.

That built-in line editor is only used when the terminal is incanonical mode, ie when the terminal sends the data to thereading application only when you press enter.

Typically, neither bash nor vim use that canonical mode as theyneed to be able to get each typed character as soon as they aretyped. But vim may want to query the terminal driver to find outwhat character is supposed to be the backspace character (ifit's good for the terminal builtin editor, then it will be goodfor vim).

bash doesn't care. It treats both ^? and ^H as a backspacecharacter.

What I think is that when you press <Backspace>, you get a ^?character. So, you should instruct your terminal driver (andindirectly vim) that "erase" is ^? with stty erase '^?'.

> Now I am using an application, it has its shell, in pressing UP> arrow key, I see ^]]A, I look through manpage of stty, I do not see how> to redefine UP arrow key.

stty is not to redefine keys, it's to tell the terminal driveron which received character it should perform its specialoperations. It works only for single characters. ^[[A is asequence of 3 characters.

And the builtin terminal line editor has few operations you canmap a key to, mainly "backward erase character", (^H, ^?)"backward erase word" (^W), "kill line" (^U), "interrupt" (^C),"quit" (^\)...

Page 148: Solaris Real Stuff

If you need a more complex line editor, then you need toimplement one just as bash (readline) or vim do.That means, leave the canon mode stty -icanon. Tell the terminalyou want to receive characters as soon as they are typed: sttymin 1 time 0, (you may want to disable the signal keys as wellstty -isig, or the automatic conversion of CR into NL stty-icrnl... stty raw may even be what you want) read charactersone at a time (c=$(dd bs=1 count=1 2> /dev/null)) then do theprocessing for each character: if it's a ^[, then read the nextone, then if it's a "[", read the next one, then if it's a "A"then perform whatever you want to be done on the pressing of theUp key.

--

>> Hi,>> I am using BASH 2.05 on Red Hat Linux, I have the following command>> in .bashrc,

>> stty erase "^H"

> The linux console and xterm both produce a DEL character (^?) when> you hit the big backspace key.

[...]

For xterm, it depends on the implementation on the default valueof the terminal driver and on the configuration.

posted by Brahma at 3:45 PM 0 comments

find comand

When you use a pipeline to run vi like this, it's stdin isn't theterminal any more. That's what "Input not from terminal" means. Theinput is coming from the pipe instead.

Try

vi `find . -name config.xml`

or

find . -name config.xml -exec vi {} \;

or

Page 149: Solaris Real Stuff

find . -name config.xml | while read f;do vi $f;done

Joe

uppose we want to compress all .txtfiles in an entire subtree. Simple enough, using what we've learnedbefore:

find /path/to/tree -name '*.txt' | xargs gzip

Another extremely useful way find can sift through files is to findfiles created or modified recently. Often you want to know what haschanged recently. For instance, to list all of the files in your homedirectory that changed within the past two days hours:

find ~/ -mtime -2

To find the files that haven't been modified in the past two days, youcan change the -mtime parameter:

find ~/ -mtime +2

You can also select files by the last time they were accessed (atime)or created (ctime). Like bash's test command, find has a wide varietyof options; reading the manpage is advised (not just for reference,either; it will give you an idea of the flexibility of this peculiarcommand).

posted by Brahma at 3:44 PM 0 comments

ownership for all the files pkgchk

> All,>> By mistake someone changed the ownership for all the files under /> etc, now no> one can access that system anymore.>> Is there anyway to recover /etc?

Yes.

What you're looking for is the pkgchk command.

First, find out what packages reference /etc:

Page 150: Solaris Real Stuff

pkgchk -l -p /etc

Then take each package listed in that output and run:

pkgchk -af <pacckage1> <package2> ... and so on.

/dale

posted by Brahma at 3:43 PM 0 comments

Setting Up the Solaris OS to Work with Cable Modems

Setting Up the Solaris OS to Work with Cable Modems

Introduction

This tip is written to help you configure a machine running theSolaris OS as DHCP client to work with DSL cable modem Dynamic HostConfiguration Protocol (DHCP). DHCP provides the IP address, defaultroute, and name servers. You need Solaris 2.6 or higher for DHCP.Setup Procedure

The following steps are required as root:

1.

touch /etc/dhcp.hme0

Replace the .hme0 with whatever the Ethernet interface for yoursystem might be, as shown by ifconfig -a.2.

cp /dev/null /etc/hostname.hme0

or

> /etc/hostname.hme0

Important note: You need to make sure that this file is empty --otherwise, the DHCP configuration won't work.3. Make sure that /etc/inet/hosts only has one line in it, the onecontaining 127.0.0.1 localhost. Any other lines will be ignored, andany additional necessary lines will be added by the DHCP client atboot time.4.

Page 151: Solaris Real Stuff

touch /etc/notrouter

This creates a file to tell the Solaris OS that your system willnot be performing routing or packet-forwarding duties. If this file isalready there, leave it the way it is.5.

cp /dev/null /etc/defaultrouter

Since the DHCP client software will automatically put the neededentries in this file, we just need to make sure that it exists as anempty file. If it already exists, rename it and create the empty filein its place.6.

cp /dev/null /etc/resolv.conf

The DHCP client will add the necessary entries. If you alreadyhave this file, rename it and create an empty file in its place.7. Check the file /etc/nsswitch.conf and look at the hosts: line.Make sure that it reads hosts: files dns.

This will enable your machine to resolve addresses using DNS, theDomain Name System. Reboot your machine. While booting, you will seestatus messages about the DHCP client; this is normal. Once themachine is booted, type the ifconfig -a command. You will see outputsimilar to this:

lo0: flags=1000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4> mtu 8232 index 1 inet 127.0.0.1 netmask ff000000hme0: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4>mtu 1500 index inet 192.168.1.35 netmask ffffff80broadcast 192.168.1.255 ether 8:0:20:d2:15:2f

Conclusion

If you have followed all my instructions, the Solaris machine is readyto get its networking information by means of DHCP.

posted by Brahma at 3:42 PM 0 comments

debug a process

Thanks everyone. Again I had lots of replies, and I will copy the themost complete answers below:

Page 152: Solaris Real Stuff

[Ric Anderson]

There are manytruss -p pidshows what system calls a process is making.

ps -fp pidshows general status

pldd pidshows what dynamic libraries (.so) files a process is using.

In addition, there are a number of tools in /usr/proc/binthat let you look at the actions of a process.

If you have lsof installed, thenlsof -p pidwill show open files for a process.

Some debuggers (e.g., gdb) can also be used with a pidto interactively debug a running process.

[Jonathan Birchall]

It all depends on how deep you wish to go and what version of Solarisyou are using.

Truss -wall -vall -fall -o<outputfilename> -p <PID> will give youdetails of system calls etc. see Man truss

You can also use the p commands see man pfiles , ptree , pmap etc.

In Solaris 10 you can also now use dtrace for a detailed view of whatthe process is doing.

Using a mixture of the about should give you a fairly detailed view ofwhat is happening.

You can also use adb (debugger).... which also pointed me to mdb(solaris 9)

With Ric's help, using lsof, we managed to track down the problem.

posted by Brahma at 3:41 PM 0 comments

swap

Page 153: Solaris Real Stuff

Jason Stout wrote...> The space problem is a good guess but I don't think thats it. The> OpenBSD box has PLENTY of storage space free. The Solaris box is> only running about 50% on the disk I'm ftp'ing to and /tmp is> fine. df -k doesn't show me swap space so maybe that's it. I> don't even know how to check swap space usage. Any help?

solaris$ swap -lswapfile dev swaplo blocks free/dev/dsk/c0t3d0s1 32,25 8 196552 182328

OR

solaris$ swap -stotal: 10208k bytes allocated + 3720k reserved = 13928k used, 169840k available

Check to make sure that /tmp is being mounted as a swap filesystem.

solaris$ grep tmp /etc/vfstabswap - /tmp tmpfs - yes -

If not, make sure that you're ftp'ing the file to a filesystemthat has enough free space...unless you have more than 900mb of swapspace, and you're ftp'ing a 900mb file to /tmp....that would explainthe problem.

> The space problem is a good guess but I don't think thats it. The> OpenBSD box has PLENTY of storage space free. The Solaris box is> only running about 50% on the disk I'm ftp'ing to and /tmp is> fine. df -k doesn't show me swap space so maybe that's it. I> don't even know how to check swap space usage. Any help?

swapctl -l ; see swapctl(8)

posted by Brahma at 3:40 PM 0 comments

process can not be stopped. And it keeps the file always opened

mp123 <[email protected]> wrote:> Hello all,

> In a real time system, mission critical, there are the following> problem:

> One process is writting a file with "log" information (now useless).> The

Page 154: Solaris Real Stuff

> process can not be stopped. And it keeps the file always opened for> writting.

It depends how the file was opened by the running process.

If the file was opened with the O_APPEND flag, you can safely truncatethe file:

cp /dev/null /some/log/file

If you do above on files which will be written to without the O_APPEND flagused on open(), further log writes will write at the end of the old offset.In this case you end up with a sparse file empty (null bytes) in the firstfew megabytes and the real log messages somewhere behind.

posted by Brahma at 3:39 PM 0 comments

Add new disks to an existing Sol9

Eric Webb wrote:> Hi guys,>> Need basic pointers/starters on how to add new disks to an existing Sol9> install. I have two extra drives I'm adding to an E250, the idea is to use> them as a mirrored pair for data. Coming from other *nixes, I'm kinda lost> without an LVM... ;)

Insert disks.Create the device nodes:

host:root:/:1 # devfsadm -v -c disk

Run format. It should find two new disk devices. Let's say they'rec0t1d0 and c0t2d0. Label them if necessary and set up the following slices:slice 0 -- entire disk except the last 50MB, tag usr, flags wmslice 2 -- entire disk, tag backup, flags wuslice 7 -- last 50MB, tag usr, flags wm

Remember to write the new label and partition map to the disks.

Assuming you're not already running metadevices:Create metadb replicas

host:root:/:2 metadb -a -f -c 3 c0t{1,2}d0s7

Page 155: Solaris Real Stuff

I suggest creating either one or three metadb replicas on your boot diskas well, if you have a small extra slice you can use for the purpose.The solaris metatools require that half the total metadb replicas,rounded down, plus one, must be available at boot time.

Set up the metadevices using metatools:

host:root:/:3 # metainit d11 1 1 c0t1d0s0host:root:/:4 # metainit d12 1 1 c0t2d0s0host:root:/:5 # metainit d10 -m d1 d2

Create a filesystem on the mirror:

host:root:/:6 # newfs /dev/mb/rdsk/d10

Set up a line in /etc/vfstab to mount the filesystem where you want it.Remember to use /dev/md/dsk/d10 and /dev/md/rdsk/d10 for the devices.Logging is a good idea. Your /etc/vfstab line should end up lookingsomething like this:

/dev/md/dsk/d10 /dev/md/rdsk/d10 /data ufs 1 yes logging

Then just mount the new filesystem:

host:root:/:7 # mount /data

And you're done.

--Phil Stracchino

> http://docs.sun.com/app/docs/doc/816-4520 - Solaris Volume Manager> Administration Guide.

There are a number of good step-by-step guides out there on the webto go along with the official docs for SVM, or "DiskSuite" for us oldtimers:

http://www.unixway.com/vm/disksuite/index.htmlhttp://slacksite.com/solaris/http://www.adminschoice.com/docs/P_Solstice_disksuite.htmhttp://sysunconfig.net/unixtips/solaris.html#svm

I still refer to these to remind myself of syntax when doing thingsthat are (hopefully) rare, like replacing failed disks.

Page 156: Solaris Real Stuff

posted by Brahma at 3:37 PM 0 comments

SunFIRE 280R: Howto disable a memory module using the Solaris 9

Subject: Re: SunFIRE 280R: Howto disable a memory module using the Solaris 9 > Hello,> we have a SunFIRE 280R with a faulty memory module. How can we disable> this memory module until we're able to replace it?

Pop the box and remove the bank, to my knowledge you cant disable a bankof memory ina 280R, best bet is to let Sun replace it, you have enoughsoft errors to get it replaced, you also had a double bit error which ledto the UE event, the CE's dont break the system so that can be a scheduledreplacement but if you have UE's often you should get a replacementa.s.a.p.

I dont know your level of service with Sun (if any) but if you have onecheck with them for a possible FCO match.

Uncorrectable system bus (UE) Event

Corrected system bus (CE) Event

The FE handbook on sunsolve shows which are which and I belive thesystemBoard as well shows the component name, however, you need to pop thewhole bank of four DIMMS if you do since they work in quads in the 280R.

So you need 4 identical DIMM Modules in at least one Bank to boot themachine otherwise you get messages like that:

This happens when you have one DIMM missing in a Bank:

CPU seeprom format: 0000.0000.0000.0001ERROR:set-bank-present: DIMMs or DIMM FRUproms missing!Powering off system!

This happens when you have mixed different DIMMs in one bank:

úPU seeprom format: 0000.0000.0000.0001Corrected ECC Errorok Corrected ECC Errorok Corrected ECC Error...

posted by Brahma at 3:36 PM 0 comments

Page 157: Solaris Real Stuff

Watch/view shell script lines during execution?

Watch/view shell script lines during execution?

> Any suggestions? I'm primarily running Solaris with occasional forays> into Linux.

You didn't say which shell.

If it's a Bourne shell variant, you can use two variables

-v Print shell input lines as they are read.-x Prints lines after eexpanding values

There are several ways to do this

1)sh -x command argumentssh -v command arguments

You can combine this withsh -xv command arguments

2) start the first line with#!/bin/sh -x#!/bin/sh -v#!/bin/sh -xv

3) In the middle of the scriptset -x (turns it on)set +x (turns it off)set -vset +v

For csh/tcsh, the first two techniques work

csh -xv command arguments#!/bin/csh -xv

in addition, you can use -X or -V, which turns the flag on beforeexecuting ~.tcshrc/.cshrc

For in-the-middle-of-the-script, useset echoset verbose

Page 158: Solaris Real Stuff

unset echounset verbose

OOOOOOOOOOOOORRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR

For ksh variants, set PS4:

#!/bin/kshPS4='[$LINENO]+'

Create a function:

# * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *function debugOn {typeset -ft$(typeset +f)

}

And include a getopts such as:

# * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *while getopts "d" Cdo case $C ind) . debugOn;;esac done

The advantage of this over the "-x", "set -x", echo, etc alreadymentioned is that you don't have to insert/delete junk when youwant to see what's going on and it works for functions where the"-x" on the first line doesn't.

posted by Brahma at 3:33 PM 0 comments

script to get all the relevant parameters from eri

gg@radier:/export/home/ghost/bin# more ndd_eri#!/bin/sh

#script to get all the relevant parameters from eri

for pram in `ndd /dev/eri \? | grep -v '?' | awk '{print $1}'`doecho "$pram \c"ans=`ndd -get /dev/eri $pram`

Page 159: Solaris Real Stuff

echo $ansdone

echo "##/$SUNdata (`ifconfig hme0 | grep inet| awk '{print $2}'`)

posted by Brahma at 3:33 PM 0 comments

Is input from file or standard in?

Is input from file or standard in?

Is there a way to check if the input to a Bash script is from a file orstandard in?

Reply

> Hi,

> Is there a way to check if the input to a Bash script is from a file or> standard in?

man test

look at the -t option.

Bill Seivert

Reply

posted by Brahma at 3:28 PM 0 comments

solaris nfs cleanup script

hi, this is solaris nfs cleanup script--------------------------------------# crontab -l | grep nfs15 3 * * 0 /usr/lib/fs/nfs/nfsfind--------------------------------------# more /usr/lib/fs/nfs/nfsfind--------------------------------------#!/bin/sh# Copyright (c) 1993 by Sun Microsystems, Inc.

#ident "@(#)nfsfind.sh 1.7 00/07/17 SMI"## Check shared NFS filesystems for .nfs* files that

Page 160: Solaris Real Stuff

# are more than a week old.## These files are created by NFS clients when an open file# is removed. To preserve some semblance of Unix semantics# the client renames the file to a unique name so that the# file appears to have been removed from the directory, but# is still usable by the process that has the file open.

if [ ! -s /etc/dfs/sharetab ]; then exit ; fi

# Get all NFS filesystems exported with read-write permission.

DIRS=`/usr/bin/nawk '($3 != "nfs") { next }($4 ~ /^rw$|^rw,|^rw=|,rw,|,rw=|,rw$/) { print $1; next }($4 !~ /^ro$|^ro,|^ro=|,ro,|,ro=|,ro$/) { print $1 }'/etc/dfs/sharetab`

for dir in $DIRSdofind $dir -type f -name .nfs\* -mtime +7 -mount -exec rm -f {} \;done---------------------------------------------------/Jörgen

posted by Brahma at 3:27 PM 0 comments

Cross-over connection between hosts

Cross-over connection between hosts

I have 2 Sun Fire V240 machines with a Gigabit card on each machine (weare not using the onboard quad card). We need to transfer data betweenthese two machines. Since we dont have a Gigabit Switch we connected across over cable between the 2 Gigabit card network interfaces. I havedone the following to make these 2 interfaces communicate but no luck.

On Machine A:

ifconfig ce0 plumbifconfig ce0 192.168.0.1 255.255.0.0 broadcast + up

route add net 192.168.0.0 192.168.0.1 1

On Machine B:

Page 161: Solaris Real Stuff

ifconfig ce0 plumbifconfig ce0 192.168.0.2 255.255.0.0 brodcast + up

route add net 192.168.0.0 192.168.0.2 1

But if I run the command 'ping 192.168.0.2' from Machine A it says hostunreachable. Same problem if I run the command 'ping 192.168.0.1' frommachine B.

What have I done wrong here. Any help would be appreciated.

Thanks.

Adding the routes is not necessary, since there is no routerbetween the machines. This may do bad rather than good.

Also, please post the output of:/usr/sbin/ndd -set /dev/ce instance 0/usr/sbin/ndd -get /dev/ce link_statuson both machines

The ability to communicate peer-to-peer over crossed UTPmay depend on "tpe-link-test?" variable in OBP(althoug I am not quite sure if this still applies to "ce" interface.

Regards,

If I dont add routes it will use the default route to communicate whichis not 192.168.0.x. I can ping 192.168.0.1 from Machine A, so theinterface is up. Similarly I can ping 192.168.0.2 from Machine B.

Anyway I'll post the ndd output. Thanks.

> If I dont add routes it will use the default route to communicate which> is not 192.168.0.x.

No, when the "ifconfig" command brings the interface up, it willadd a route to that network only based on the netmask, and sincethat route is more specific than the default route, it will usethat route (and that interface) for traffic that goes directlyto that network.

> I can ping 192.168.0.1 from Machine A, so the> interface is up. Similarly I can ping 192.168.0.2 from Machine B.

Page 162: Solaris Real Stuff

Yes, that means the kernel has associated an IP address with thatinterface and will recognize packets with that destination IP addressas local packets. But that does not mean that the device driverfor your ethernet interface is able to do anything productive oreven that you have an electrical connection of any type on theethernet hardware. (The interface can be in the "up" state withouteither of those things being true.)

My advice is to just go get a gigabit switch. It would be nice touse professional grade equipment for a situation like this, buteven cheap consumer grade equipment is probably better (in thesense of reliability and maintainability) than putting togethera system that obviously requires careful manual configuration justto get the machines to talk to each other.

However, if a switch isn't possible for some reason, another ideais to try the same crossover cable on the built-in ethernet insteadof the gigabit ethernet. I don't know all the intricacies andquirks of autonegotiation on gigabit vs. fast ethernet, but it seemspossible that it might be easier with the slower interface type.And, if you can get it to work there, then you have proved thatyour cable is not totally defective, and you've proved that youknow the right configuration commands except for what to dospecifically on a pair of gigabit ports.

- Logan

Thanks a lot for your post. I don't have a Gigabit switch so that isnot an option. That is the reason we are using cross over cable betweenthe 2 machines. I'll try the same on the built-in ethernet card whichis also 10/100/1000 Base T ethernet quad card. I'll reomove the manualroutes and try this.

> Thanks a lot for your post. I don't have a Gigabit switch so that is> not an option. That is the reason we are using cross over cable between> the 2 machines.

Your crossover cable -- is it a crossover for 100base-T or is itfor 1000base-T? 100base-T doesn't require all 4 pairs to becrossed while 1000base-T does, so it could be that the cable youhave don't have all pairs crossed.

/I think your netmask is wrong, your using Class C privae address192.x.x.x, your netmask should be 255.255.255.0

Page 163: Solaris Real Stuff

if this didn't work, to test network connection goto open boot promptuse test-net command.

posted by Brahma at 3:27 PM 0 comments

check if the shell is interactive or not in the profile

On Sun, 11 Sep 2005 10:08:49 +0200, <[email protected]> wrote:> Barry,> how can I check if the shell is interactive or not in the profile?

What I say here applies to the Bourne Again shell (bash), but I believeksh is similar:

PS1 is set and $- includes an 'i' if the shell is interactive.

if [ "${PS1+set}" = set ]then echo interactiveelse echo non-interactivefi

and

case $- in*i*) echo interactive;;*) echo non-interactive;;esac

or more briefly:

[[ $- =~ i ]] && echo interactive || echo non-interactive

The shell will be interactive if

a) both stdin and stdout is connected to a terminal,b) the shell was not started with non-option commandline arguments, andc) the shell was not started with the -c option.

So,

if [[ -t 0 -a -t 1 -a $# -eq 0 -a $- =~ c ]]thenecho "Yessur, a, b, and c."fi

Page 164: Solaris Real Stuff

since the "=~" operator exists in [[...]] but not in [...].People are often happy to test only [ -t 0 ], which works asintended in most cases.

> Is it something about the tty?

Yes.

-Enrique

posted by Brahma at 3:25 PM 0 comments

delete an unusually named file using its i-node value

How can I delete an unusually named file using its i-node value?

Assuming the inode is 12345,find . -inum 12345 -exec rm {}\;

posted by Brahma at 3:25 PM 0 comments

Removing old files that contain a phrase using crontab

Removing old files that contain a phrase using crontab00 4 * * * find /sap_basis/log -name "decrypt_DH9383PS.log*" -mtime +14-exec sh -c 'grep "No Matches" $1 >/dev/null 2>&1 && rm $1' {} {} \;

This is a tiny shell script (written inline), which runs grep and ifthat reports success then runs rm. The doubled {} are to handle casethat $0 is assigned the first arguement, and $1 the second one.

Just for fun, a further refinement: grep -l will output file names only,and each file at most once, obviating the need to cut and sort -u.

find /sap_basis/log -name "decrypt_DH9383PS.log*" -mtime +14 |xargs grep -l "No Matches" | xargs rm

posted by Brahma at 3:24 PM 0 comments

How to show shmsys:shminfo_shmse value?

Subject: Re: How to show shmsys:shminfo_shmse value?

> One value is missing in /etc/system, though: shmsys:shminfo_shmse. I

Page 165: Solaris Real Stuff

^^^^^^^^^^^^^^^^^^^^

Here is a little typo as the ending ``g'' is missing.

> don't know the current value and do not want to risk reducing the> parameter. The FAQ gives nothing.

You can examine and set these parameters dynamically using mdb [1]:

thales# echo shminfo_shmseg/D | mdb -kw | tail -1shminfo_shmseg: 200thales#

Andreas.

posted by Brahma at 3:23 PM 0 comments

"split /swap between 2 disks": HOW? (and, "GOOD?")

Subject: Re: seen-advice: "split /swap between 2 disks": HOW? (and, "GOOD?")

> Various times I've seen the advice, when setting> up partitions, to "split" /swap between two disks.

As rule, that is good advice, because Solaris will usethe slices in a round-robin manner, spreading the I/Oload (so disks on different controllers are ideal).

> Just how does one do that?

> ie, what commands do you give?

Assuming you're partitioned the other disk with format,the easiest way to add the swap is to edit /etc/vfstab.Right now this machine has one swap slice, on c1t1d0s1:

/dev/dsk/c1t1d0s1 - - swap - no -

If you add (say) target 2, you'd add another line likethis:

/dev/dsk/c1t2d0s1 - - swap - no -

Assuming you want to put swap on slice 1 of that disk too.

Page 166: Solaris Real Stuff

You can also add swap on the fly using the swap command,but it will go away when you next reboot if you don'tadd it to /etc/vfstab.

E.g., swap -a /dev/dsk/c1t2d0s1 will add the above device.

Subject: Re: seen-advice: "split /swap between 2 disks": HOW? (and, "GOOD?")

> On Fri, 9 Sep 2005, David Combs wrote:>> Various times I've seen the advice, when setting>> up partitions, to "split" /swap between two disks.> As rule, that is good advice, because Solaris will use> the slices in a round-robin manner, spreading the I/O> load (so disks on different controllers are ideal).

To note one exception to the rule: If reliability is more important thanperformance, making a mirror for all slices, including swap, is probablythe better idea:Just because swap is strictly non-persistent data doesn't mean you wantto lose it in-flight.

My guess for an unrecoverable defect on swap would be a kernel panic;never had it though.

Yours, Bernd

posted by Brahma at 3:21 PM 0 comments

command line arguments

"$@" produces the list of positional paramters (command line arguments).

$@ and $* are similar, but behave differently in some cases:

echo 'for word in "$@"; do echo "<$word>"; done' > /tmp/trashsh /tmp/trash "A couple of sins" "A spaced sentence" some single words

produces

<A couple of sins><A spaced sentence><some><single><words>

Page 167: Solaris Real Stuff

but if you replace "$@" with "$*" you get

<A couple of sins A spaced sentence some single words>

and if you drop the quotation marks, as in "for words in $*; do..."

<A><couple><of><sins><A><spaced><sentence><some><single><words>

Without the quotation marks it does not matter if you use $* or $@.

posted by Brahma at 3:20 PM 0 comments

mount the CD:

If the LVM (Logical Volume Manager) is running (it should be startedautomatically at bootup) all you will need to do is put the CD into theCD caddy and the LVM will mount it automatically. If the LVM is notrunning, follow these steps to mount the CD:

1. Open the CD door and load the CD onto the caddy.2. su to root (su -)3. cd /4. Verify the mount point /cdrom exists. If it does not exist create itusing, "mkdir cdrom"5. mount the cd with the following command:

% mount -r -F hsfs /dev/dsk/c0t6d0s0 /cdrom

6. If you use, "cd /cdrom", you should see the data on the CD.

Let us know if you are successful.

posted by Brahma at 3:19 PM 0 comments

saving a set of data from a file to another file in Vi editor

Re: saving a set of data from a file to another file in Vi editor

Page 168: Solaris Real Stuff

What do you mean by save? Save to a new file?

Try:

:25,45w/some/path/filename

Brian

>> Hi Gurus,>>>> Do you know in Vi, how to save the data from one line to another?>> I mean I want to save data from line 25 to line 45, whereas the file> contains 500 lines.>> How I can achieve this? I want to do this when I am in vi in the edit/read> only mode.>>>> Thanks & Rgds,>> Narayan>

posted by Brahma at 3:19 PM 0 comments

Within sorted files, determine if gap exists

Re: Within sorted files, determine if gap exists

Hi Try this,katiyar@/export/home/katiyar/test> cat sorted_fileHRND.A050401000097OCC_177.raw.747.OCAHRND.A050401010001OCC_177.raw.748.OCAHRND.A050401020005OCC_177.raw.749.OCAHRND.A050401030006OCC_177.raw.750.OCAHRND.A050401040007OCC_177.raw.751.OCAHRND.A050401050008OCC_177.raw.752.OCAHRND.A050401060009OCC_177.raw.753.OCAHRND.A050401070010OCC_177.raw.754.OCAHRND.A050401080011OCC_177.raw.755.OCAHRND.A050401090012OCC_177.raw.756.OCA

Page 169: Solaris Real Stuff

HRND.A050401100013OCC_177.raw.757.OCAHRND.A050401110014OCC_177.raw.758.OCAHRND.A050401130016OCC_177.raw.760.OCAHRND.A050401140017OCC_177.raw.761.OCAHRND.A050401150019OCC_177.raw.762.OCAHRND.A050401160020OCC_177.raw.763.OCAkatiyar@/export/home/katiyar/test> cat script.shnext=2prev=1lines=`wc -l sorted_file`while [ $next -ne $lines ]doval1=`eval sed -n \'$prev p\' sorted_file|cut -d. -f4`val2=`eval sed -n \'$next p\' sorted_file|cut -d. -f4`val1=`expr $val1 + 1`if [ $val1 -ne $val2 ];thenecho "Missing file number : $val1"fiprev=$nextnext=`expr $next + 1`donekatiyar@/export/home/katiyar/test> ./script.shMissing file number : 759katiyar@/export/home/katiyar/test>

Hope it helps,

Hi Vram,Actually I am not very comfortable with awk :)) .....now answer to your Qregarding '$prev p'. Inorder to print a particular line of a file say 5ththru sed we writesed '5 p' filenameSimilarly I have the line number stored in prev variable and want to use itin sed. but since $ has a special meaning in sed I need to evaluate $prevfirst and then pass that value to sed. An 'eval' before sed does this forme. Hope I am clear. Feel free to revert back if still there are any doubts.-n option is used to avoid printing of line twice.Hope it helps

Re: Within sorted files, determine if gap exists

In awk it would look something like this:

awk -F. '{curr=$4}NR!=1&&prev+1!=curr{print "Gap Between: "prev""curr}{prev=$4}'

Page 170: Solaris Real Stuff

The output from your example input would be:

Gap Between: 758 760

Brian

- No " after curr. the quotes balance now. 1 set around "Gap Between:" andone set around a space between prev and curr.

I haven't tested this, but this _might_ work where the numbers wrap:

awk -F. '{curr=$4; if (curr0){curr=1000}}NR!=1&&prev+1!=curr>> {print "Gap Between: "prev" "curr}{prev=curr00}'

If your file is sorted, then won't 000 come first instead of after 999?

Brian

To print out the entire filename change to:

{print "Gap Between: "prev" "curr"at line: "$0}

That will print the line AFTER the gap.

Brian

posted by Brahma at 3:18 PM 0 comments

Relationship between load average and CPU busy or CPU idle

Subject: Re: Relationship between load average and CPU busy or CPU idle

js wrote:> Is there some kind of relationship between the load average figure and CPU> busy / idle percentage ?

Yes, there's SOME relationship, but it's not a really simple one.

> Thus, what I am concluding so far is that a load average nearing N-CPUs will> have a very low CPU idle %.

Every thread on the machine can either be runnable or not. For example,if you call "sleep(100);" in a thread, then for the next 100 seconds,the thread isn't runnable. Likewise if you're waiting on the network.But if your thread is in the middle of doing processing (actually on a

Page 171: Solaris Real Stuff

processor) or in principle *could* be doing processing if a processorwere available, then it's runnable. To make matters a tad bit morecomplicated, there is a third state on Unix systems, which is that yourthread is blocked because of some short-term I/O, like reading from orwriting to a disk. Because Unix expects disk I/O to finish really soon(at which point your thread will become runnable again), this isbasically counted as "almost runnable".

So, the load average is computed by periodically looking at the stateof all the threads and counting the ones the number of ones that arerunning, runnable but not running, and "almost runnable" (in short-termdisk wait). But, the load average is also a decaying average over acertain interval (1 minute, 5 minutes, or 15 minutes), so it doesn'tnecessarily reflect what the situation is at any given moment.

Meanwhile, idle time is computed by looking at what the processors aredoing. The system keeps statistics about what percentage of the timethe CPU is running, idle, etc. It does this by periodically waking up(via an interrupt) and gathering statistics about each CPU for a certaininstant in time. It puts together lots of these samples to get a moreaccurate picture of how much time is spent doing what. But the stuffwith short-term disk wait makes things complicated again. As I understandit, if there is ANY outstanding short-term disk wait (for ANY processor!),then at the time that routine takes its sample, it counts ALL processorsthat are idle as in short-term disk wait instead. So, in a 4 processorsystem, one processor waiting for short-term disk I/O and 3 processorsidle will be counted just the same as all if all 4 processors were reallywaiting for short-term disk I/O.

So, to recap:

* load average means periodically sample the run queue and determinehow many threads are running, runnable, or "almost runnable" (shortterm disk wait), then make a decaying average of this.

* CPU idle/system/user means periodically sample all the CPUs anddetermine what percentage of the time they're running and not,but if a CPU is idle and it or ANY OTHER CPU has a short-termdisk wait going on, count that as disk wait (non-idle).

So yes, your conclusion is basically correct. If you have a load averageof N, and if you have N processors, you are going to have a low idlepercentage. This assumes you don't have a processor set defined thatmakes things more complicated: you could have 1000 CPU-bound threadsthat do no I/O at all, but using a processor set, put all 1000 of themon a single processor in a 4 processor system, and run nothing else so

Page 172: Solaris Real Stuff

that the other 3 processors are idle. Then your load average would be1000, but your idle percentage would be 75%.

One more note: the "any other CPU" thing means something a bit funnyfor a multiprocessor system that does I/O. If you have one single threadthat is doing nothing but disk I/O on a 4 processor system, you mightexpect to see 75% idle, but you won't. You'll see more like 0% idleand 100% iowait. That's because one of the CPUs is basically perpetuallyin iowait, and though the others are perpetually idle, they are allgetting counted as in iowait by the routine that collects the statistics.I guess if you didn't know this, you could get an inaccurate sense ofthe I/O load on a multiprocessor system. (A better measure would probablybe whether I/O service times tend to increaes as the server's load grows.)

By the way, I've based a lot of the above on the Solaris Internals book and onhttp://sunsite.uakom.sk/sunwor ldonline/swol-08-1997/swol-08- insidesol... .Both of these are several years old, so it's possible something has changedin more recent versions of Solaris. (It's also possible I've totallymisunderstood everything...)

The "load average" is the number of runnable jobs; so it would stand toreason that if the load average is >= # of CPUs that your CPUs will thenbe busy.

Casper--

posted by Brahma at 3:17 PM 0 comments

Produce a list of the home directories of the listed users.

> ********************************************* #!/bin/sh # HomeDir [user]...> ## Produce a list of the home directories of the listed users. # If no users> are given, produce all home directories.

Hi,Try this script.....assuming ur group_id is stored in $grpIFS='^J'for i in `cat /etc/passwd`doif [ `echo $i|cut -d: -f4` -eq $grp ];thenuser=`echo $i|cut -d: -f1`mkdir /home/$userfidoneHope it helps.........note that u can't simply cut from password file

Page 173: Solaris Real Stuff

otherwise u will be creating directory for system accounts too.n

Something like this should work:

cat /etc/passwd | awk -F: '{print "mkdir "$6" && chown "$1":"$4" "$6}' | sh

Try it without the "|sh" first to see if it looks correct. It should attemptto create the directory, but if it already exists, will get an error and notdo the 'chown'. If it doesn't exist (and succeeds), it should do the chown.

posted by Brahma at 3:16 PM 0 comments

script to autosend customer report to Ops & Sys Admin

=========================================#!/usr/bin/ksh# script to autosend customer report to Ops & Sys Admin

BASEDIR=/usr/local/mrtg/customerZIP=/usr/bin/zipUUENCODE=/usr/bin/uuencodeMAILTO="[email protected] [email protected]"

TODAY=`date +"%d%m%y"`cd $BASEDIRfor DIR in `ls -l | egrep "^d" | awk '{ print $9 }'`;doFILENAME=$DIR$TODAY$ZIP -r $FILENAME $DIR > /dev/null 2>&1$UUENCODE $FILENAME.zip $FILENAME.zip | mailx -s "MRTG Report for $DIR -$TODAY" $MAILTOrm $FILENAME.zipdone

posted by Brahma at 3:15 PM 0 comments

Summary: probably over a million files to delete

Subject: Summary: probably over a million files to delete

Thanks all.

Some of the replies :

A couple replies say I will still some time to complete the

Page 174: Solaris Real Stuff

deletion, so I probably can do"nohup <delete command> &to let it run by itself in the background.

As I need to remove files older than 180 days, I chose :

cd <dir where files exist>find . -xdev -type f -mtime +180 -exec rm {} \+ &

==========

Other replies :

Mount the filesystem with logging on. You can do this without unmounting the filesystemmount -o remout,logging <dev> >filesystem>

-----------

cd <directory>find . -name '*2004*12' -o -name '*2004*11*' -o .... | xargs rm -f &

-----------

1. find . -exec rm {} \;2. for x in `ls`; do rm $x ; done

-----------

You ran into a limitation on the ability of the shell. I have run intosimilar problems in the past. Here is how I worked around it. Itrequires the use of perl. It will remove all files within the directorywhere executed. Test to verify for yourself how it works.ls | perl -ne 'chomp; unlink'

posted by Brahma at 3:14 PM 0 comments

find and no NFS

Subject: Summary: find and no NFS

The original post was about locate world-writeable file and setuid files...etc.

Page 175: Solaris Real Stuff

I want to limited the find to local filesystems only. I don't want multiple NFS clients run the find on the same NFS export. I got some permission errors and "Stale NFS file handle" errors with "-local".

There were many many replies. Sorry, I can't list everyone replied. The suggestions are:

1. It is your fault, RTFM again.Manpage said -local *WILL* descend the hierarchy of non-local directories.

2. Check out "-prune"Always yields true. Does not examine any directories or files in the directory structure below the pattern just matched.

3. use -xdevSame as -mount primary

4. try -mount

5. Manpage! There's an example in the manpageExample 9: Printing local files without descending non-local directories%find . ! -local -prune -o -print

Anwser========1. The main point is "-local" still descend NFS mounted directories. That's why I got some errors.2. To get the job done--->Since I want find to drill down from / on many systems with probably different filesystem setup. I can't use "-mount" for each filesystem.

Darren Dunham pointed out thatfind / \( -fstype ufs -o -fstype vxfs \) -type f -perm -6000 -print>All that happens there is that the result on remote files is false. It>doesn't stop the recursion. You need -prune for that. That's what>Example 9 in the man page does.

Dan Stromberg suggested> find . -fstype nfs -prune -o -print

Nope, I don't have an exact answer yet. I was out yesterday. I played with this a bit this morning. The followings seems to good so far

Page 176: Solaris Real Stuff

find / -fstype nfs -prune -o -perm -2 -print find / \! -local -prune -o -perm -2 -print

Thanks a lot ... and I'll remember to boot up my brain before reading the man page.

posted by Brahma at 3:13 PM 0 comments

touch: /opt/FrontBase/Backups/Test cannot create

Dear managers,

I got a bunch of responses to my summary telling me that using anon=0 is no good. It gives root access to everyone which is definitely not what I want. My first attempt

share -F nfs -o rw,root=merkur /opt/FrontBase/Backups

failed. I finally tried it with the FQDN

share -F nfs -o rw,root=merkur.smartsoft.internal /opt/FrontBase/Backups

and this works like a charm.

Thanks a lot!

Regards,

Andreas

> Mike suggested to do>> Share -F nfs -o rw,root=<hostname or ipaddress of client> > /opt/FrontBase/Backups>> This did not make a difference. I then followed the suggestion of > Saxon and Michael.>> share -F nfs -o ro,anon=0 /export/install>> This worked! Thanks a lot!>> Original posting:>> Dear managers,>>

Page 177: Solaris Real Stuff

>> I have exported a directory on a Solaris 8 machine as follows>>>> share -F nfs -o rw /opt/FrontBase/Backups>>>> and mounted the directory on a Sol 9 machine with>>>> mount -F nfs 10.0.0.200:/opt/FrontBase/Backups /opt/FrontBase/Backups>>>> mount tells me that the directory is mounted read write.>>>> /opt/FrontBase/Backups on 10.0.0.200:/opt/FrontBase/Backups >> remote/read/write/setuid/xattr/dev=4900003 on Wed Aug 24 14:22:55 >> 2005>>>> However, creating a file as root on the Sol 9 machine fails:>>>> bash-2.05# touch /opt/FrontBase/Backups/Test>> touch: /opt/FrontBase/Backups/Test cannot create>>>> What am I missing?

posted by Brahma at 3:12 PM 0 comments

a cheatsheet with Sun commands to nmanage these storage arrays?

Original question:I just inherited a bunch of A1000, D1000, D5200arrays,connected to Sun's 4500,6500, E420 etc..Some are standalone, some Veritas Vxvm/VCS controlled.Does anybody have a cheatsheet with Sun commands tomanage these storage arrays?

Darren Dunham from TAOS.COM summed it best:

A1000/A3500 have on-board RAID controllers. You needto grab Raid manager 6.22.1 from Sun (and not berunning Solaris 10) to manage them.

'rm6' is the GUI tool that's generally used.

D1000 is JBOD scsi. No real commands here.

A5x00 are JBOD fiber. The only commands you mightneed here would be 'luxadm' to aid online/offline anddiscovery of individual drives. You might also need

Page 178: Solaris Real Stuff

'cfgadm' to configure controllers or targets up anddown.

posted by Brahma at 3:11 PM 0 comments

Thursday, September 08, 2005

veritas free space

Run, vxdg -g <diskgroup> free and see if you have any disk space left.

Zakaria Mattar wrote:

>Hi,>>I am trying to create a volume in a disk group having 96 G.B free space>but I am getting No more space in disk group configuration.

posted by Brahma at 1:16 PM 0 comments

Cron format

anonymous wrote:> How to make sertain task to be runned with cron on every month's first> sunday (or months first full week day sunday)> and how about every second sunday

I don't think cron has support for that built into it.

So, you will have to go another route. The easiest way is to run thecron job every day for the first 7 days of the month. Only one of thosedays will be a Sunday, so just check if the current day is Sunday andexit if it isn't Sunday.

For example:

#! /bin/sh

if [ `/usr/bin/date '+%Ow' -ne 0 ]then# it's not Sunday, so exitexit 0fi

# rest of cron job goes here

Page 179: Solaris Real Stuff

The second Sunday of the month can be done in a similar way. Since thefirst Sunday will fall in day 1-7, and since the second Sunday is 7 daysafter the first, the second Sunday will fall on day 8-14. So, run thejob every day from day 8 to day 14, and check if it's a Sunday.

I'm not sure what you mean by "months first full week day sunday".If you mean the Sunday that falls during the first full week of themonth, then to me that is equivalent to the first Sunday of the month,because the week begins on Sunday (in the United States, at least).

Hope that helps.

- Logan

Reply

> Actually you should be able to set the date 1-7 and the day of the week> to 0(or sun). As cron only executes the command if all parameters> match, this will only run when day 1-7 falls on a Sunday so no need to> build the logic into your command.

I know you can specify by day-of-month and by day-of-week, but thequestion is whether, when you do this, cron interprets it as the unionor the intersection of the two sets of days.

Looking at "man crontab", it would appear that the rule is a bitcounter-intuitive: it seems to be that it's the union, unlessone of the fields is just the asterisk ("*"), in which case it's theintersection.

I'm basing this on two examples from the from the crontab manualpage. One says that

0 0 1,15 * 1

would run the command "on the first and fifteenth of each month, aswell as on every Monday" (so the union of the two sets), and thenanother example says that

0 0 * * 1

would run the command "only on Mondays" (which would be the intersectionof the two sets since in this case "*" means every day of the month).

Page 180: Solaris Real Stuff

I guess I've never looked at cron in this much depth before, but itoccurs to me that this notation is really confusing! The rules aren'tclear about when it's intersection vs. union.

posted by Brahma at 1:15 PM 0 comments

Solaris X86 Jumpstart using DHCP and PXE

Subject: Summary: Solaris X86 Jumpstart using DHCP and PXE

Here was my question:

I need to install 460 pcs with Solaris 10 3/05. My Boot and Install Server are based on a Ultra Sparc 250 system.If a install any PC using rarp everything works perfect. The pc is installed automatic using the configuration store in the sysidcfg, etc, etc

But here is the problem. All the 460 pc need to be installed using DHCP and PXE procedure.

The pc boot from the net perfect, but I do not why ignores the sysidcfg and start a interactive install.

I have done:

- ran the add_install_client- checked the files in /tftpboot- added a macro to my DHCP server- added a PXE macro to my DHCP server- added a client entry which includes the network_ip macro- added all the vendor options to the network_ip macro

Solution:

Create a file "system.conf" file under the /etc/netboot under de jumpstart directory tree and add the following entries to the system configuration file:

SsysidCF="JumpStart-Sever:/wrk/jumpstart"SjumpsCF="JumpStart-Server:/wrk/jumpstart"

And voila!!!!

Thanks to Andrew

posted by Brahma at 1:14 PM 0 comments

Page 181: Solaris Real Stuff

Panic while checking devices during a reboot

Problems and their solutions

* Panic while checking devices during a reboot

Occasional when booting the server, a message similar to thefollowing may appear:

Configuring /dev and /devices

panic[cpu1]/thread=30001218fe0: bad kernel MMU miss at TL 2

%tl %tpc %tnpc %tstate %tt1 0000000001035cc4 0000000001035cc8 4480001605 068%ccr: 44 %asi: 80 %cwp: 5 %pstate: 16<PEF,PRIV,IE>2 00000000010071f0 00000000010071f4 9180001506 068%ccr: 91 %asi: 80 %cwp: 6 %pstate: 15<PEF,PRIV,AG>%g0-3: 0000000000000000 0000000000000002 0000000077196900 0000000077196900%g4-7: 0000000000000000 0000000000000002 000000000140a4c0 0000000000000068Register window 6, caller kmem_cache_alloc+64%o0-3: 000003000000bb88 000002a1002c6000 0000000000000031 0000000000000000%o4-7: 00000000000002a1 000000007804f960 000002a1002c7501 00000000010ac520%l0-3: 0000000001035cc4 0000000001035cc8 0000004480001605 000000000102bf54%l4-7: 000003000000bd30 000003000000bd58 0000000000000000 000002a1002c7db0%i0-3: 000003000000b408 0000030001218fe0 0000000000000000 0000000000000000...

This appears to be caused by one of the server CPUs becomingslightly unseated. The CPUs are secured to the motherboard by screwswhich are tightened using the torque screwdriver provided with eachserver. The CPU retaining screws in every server we have put intoservice so far have been far looser than they should be leading tothis problem occurring. The screws have now been tightened but shouldthis problem occur, the first thing to try is re-tightening thescrews. The screwdriver can be found to the front of the server whenthe cover is removed (it's bright green and is hard to miss) andinstructions for tightening the screws are attached to the top of theCPUs.

posted by Brahma at 1:13 PM 0 comments

Mike's goodies

Page 182: Solaris Real Stuff

more READMEThis shar archive contains all the "Auto Admin" files which originatedon the HP-UX systems running HP-UX 9.01 in the Department of Chemistryat the University of Toronto. Except for "Localstuff", these files arenow being used on our SGI IRIX 5.X and 6.X systems, SunOS 4.1.Xsystems, Solaris 2.X systems, on Linux Red Hat 5.X, and on FreeBSD3.X. Unfortunately, I don't have a newer HP system to check that thesescripts still work properly on HP-UX.

The 'autoadmin.tex' file contains the text of the talk I gave at theInterWorks Conference in Orlando (May 1994), but does not completelydescribe all the "Auto Admin" files in this archive, or all theirfeatures, since the files here are my latest versions.

Look over the files CAREFULLY before using them on your system.

The 'installhp.out' file is located in '/usr/local/doc' on my systems.The rest of the files are in '/usr/local/bin' on my systems.

You will also need to create a soft link in '/usr/local/bin':ln -s Find_setuid Find_links

The crontab entries used to run these commands are:

## General checks.##0,15,30,45 * * * * /bin/csh -c /usr/local/bin/Hourlystuff >/dev/null 2>&100 * * * * /bin/csh -c /usr/local/bin/Hourlystuff >/dev/null 2>&1## Accounting and clean up.## Do cpu accounting and other stuff at 6:15 AM every day.15 6 * * * /bin/csh -c /usr/local/bin/Dailystuff >/dev/null 2>&1# Do disk accounting and other stuff at 12:30 AM every Sunday.30 0 * * 0 /bin/csh -c /usr/local/bin/Weeklystuff >/dev/null 2>&1# Do cpu accounting and other stuff at 6:45 AM every month.45 6 1 * * /bin/csh -c /usr/local/bin/Monthlystuff >/dev/null 2>&1# Do Teststuff - activate when needed.#00,05,10,15,20,25,30,35,40,45,50,55 8-16 * * * /bin/csh -c/home/mikep/bin/Teststuff >/dev/null 2>&1

and should be installed by root with 'crontab -l >/tmp/root',edit "/tmp/root", 'crontab -r', 'crontab /tmp/root'.

I have included 'installhp.out' which contains the HP-UX system

Page 183: Solaris Real Stuff

install/update steps that are carried out manually; some of these couldprobably be added to the 'Localstuff' given some effort. I'd like tohear about other steps which you automate.

You might also want to check out the following URL for some otherinformation on automating various aspects of system administration:http://www.oac.uci.edu/support/ddcs/automation/.

posted by Brahma at 1:13 PM 0 comments

Friday, September 02, 2005

Accessing metadevices when booting from CDROM

This does assume that the system isn't using DiskSuite or Veritas,however - if the system is mirrored, that'll have to be undone[1]before rebooting after editing anything, otherwise I recall there's therisk of filesystem corruption.

[1]=For DiskSuite, an overview is at :http://unixway.com/vm/disksuit e/bootcdrom.html. Or, as it's Solaris 9,he could just load the DiskSuite driver :http://sunportal.sunmanagers.o rg/pipermail/summaries/2005-Ju ly/006661...

SUMMARY: Accessing metadevices when booting from CDROM?Guy D. dy7t at yahoo.comSun Jul 24 16:28:12 EDT 2005

* Previous message: SUMMARY: Patch cluster breaking ssh* Next message: SUMMARY: Replacing NIS slave server and automount problem* Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Thanks to everyone who responded:

"Santhakumar, Siva" <Siva.Santhakumar at navitaire.com> "Sandwich Maker" <adh at an.bradford.ma.us>"Thomas M. Payerle" <payerle at physics.umd.edu> "Petri Kallberg" <Petri.Kallberg at Sun.COM>

The definitive answer came from Petri who (probablybecause of where he works) pointed me to a Sundocument that gives step-by-step directions onperforming this procedure.

It is Document ID: 75210 (Solaris[TM] Volume ManagerSoftware and Solstice DiskSuite[TM] Software: Mounting

Page 184: Solaris Real Stuff

Metadevices) I am not sure if you need to have a Sunsupport contract to read it (try searching atsunsolve.sun.com for the document ID.) I will justcopy it here for anyone who is interested:

---------------------------------------------------Problem Statement: Top

How can you access data which is located on amirrored, RAID5, or a concatenated metadevice when thesystem is booted from a CD-ROM into a single usermode?(For example, reset the root password on a mirroredroot partition.)

Resolution: Top

1) Boot to single user mode using Solaris[TM] 9 OE 1/2CDROM (or DVD).

ok boot cdrom -s...

2) Find the Solaris[TM] Volume Manager md driver andunload it.

# modinfo | grep md38 11d1703 ff9 - 1 md5 (MD5 Message DigestAlgorithm)113 12f1b02 1ecf 70 1 ramdisk (ramdisk driverv1.15)127 705c2000 2375a 85 1 md (Solaris VolumeManager base mod)# modunload -i 127

For Solaris 9 Operating System (Solaris OS)metadevices:

3) Mount one of the sub-mirrors of your rootmetadevice as read-only to get a copy of metadbconfiguration information. NOTE: You'll need to mounta regular disk device for this step.

# mount -r /dev/dsk/c0t0d0s0 /a# cp /a/kernel/drv/md.conf /kernel/drv/md.conf# umount /a

Page 185: Solaris Real Stuff

For pre-Solaris 9 OS metadevices:

3) Before Solaris 9 OS, information about metadb's wasstored in the /etc/system file instead of in/kernel/drv/md.conf and the format used was slightlydifferent.

Mount one of the sub-mirrors of your rootmetadevice as read-only to get a copy of metadbconfiguration information. NOTE: You'll need to mounta regular disk device for this step.

# mount -r /dev/dsk/c0t0d0s0 /a# cp /a/etc/system /tmp/system# umount /a

Find metadb information from /tmp/system, forexample:

* Begin MDD database info (do not edit)set md:mddb_bootlist1="sd:7:16 sd:7:1050sd:7:2084 sd:15:16sd:15:1050"set md:mddb_bootlist2="sd:15:2084"* End MDD database info (do not edit)

This information can be converted into a formatthat Solaris 9 OS understands simply by adding ":id0"after each metadb identifier. These lines are thenadded to the end of /kernel/drv/md.conf.

The previous example would then look like thefollowing:

mddb_bootlist1="sd:7:16:id0 sd:7:1050:id0sd:7:2084:id0 sd:15:16:id0sd:15:1050:id0";mddb_bootlist2="sd:15:2084:id0";

NOTE: Remember to add ";" at the end of each line!!!

4) Load the Solaris Volume Manager md driver andsynchronize meta devices

Page 186: Solaris Real Stuff

# modload /kernel/drv/md# metasync -r

5) READY !! Now you're able to use your existingmetadevices as usual. To view your metadeviceconfiguration and status, use the metastat command.

# metastat

---------------------------------------------------

OK, I did not actually try this, but it looks like itshould work. It still seems like way too much workto me that I need to add a bunch of lines in crypticformat to /kernel/drv/md.conf just so it can figureout where the metadevice database replicates arelocated.

Let me add one last story to this drama. Earlier, Ibooted the system using a Solaris 9 install CD. But,I forgot to boot the system with "boot cdrom -s" andjust used "boot cdrom" So, the system booted off theCD and started running the processes for installing anew OS (asking me what language I want to use, etc.) I was using a graphics terminal, so I just ignoredthat and opened another xterm window. I wasastonished to see (after a unknown amount of time)that when I entered "metadb" and "metastat" that I wasable to see all of my existing metadevice databasesand metadevices! But, after another period of time, Iwas not! I figured that the Solaris installer must besmart enough to probe for old metadevices and wouldprobably offer me the option to retain them if Icontinuted to install the OS.

One person assumed that I had a RAID-1 rootfilesystem, and suggested that I boot from CD, mount/dev/dsk/cXdXtXsX (directly specifying a known, goodroot partition slice), then change the /etc/vfstabfile on that slice to indicate the root filesystemshould be mounted from /dev/dsk/cXdXtXsX rather than/dev/md/dsk/dXX, then reboot into that slice.)

Two people pointed out that the default metadb sizebecame larger starting in Solaris 9 and were worried

Page 187: Solaris Real Stuff

that might be causing the trouble. But, this turnedout not to be a factor.

Thanks again for everyone's help

My original question is below:

--- "Guy D." <dy7t at yahoo.com> wrote:

> Hello,> > I have a Solaris 8 system with several RAID 0 and> RAID> 1 filesystems created with Solstice Disk Suite (SDS)> > Now, I believe that SDS was incorporated into> Solaris> 9 & 10 (and renamed to Solaris Volume Manager.) I> was> wondering how I could access my RAID filesystems if> I> lost the root filesystem and had to boot from CD.> > The Solaris 8 CD does not support SDS (it was an> optional package.) But, I thought if I used a> Solaris> 9 or Solaris 10 install CD, I would be able to> configure them to be able to access the SDS> metadevices.> > But, when I finally tried it, I was not successful. > It seemed to me that I should have just needed to> use> "metadb -a <slices containing metadb replicates>"> Any advice for what I need to do? I know where all> the metadb replicates are located. > > I would think this would be a fairly common task> (recoving data from metadevices if the root> filsystem> failes) but I have not been able to find any> answers.> > Thanks,> Doug

Page 188: Solaris Real Stuff

posted by Brahma at 2:57 PM 0 comments

capture important information about processes

This simple ad hoc script could help to capture important informationabout processes, but could also be easily modified to do additionalthings -- such as start the process if it isn't running or log thenumber of such processes that are running at each check interval. Thecomplete script appears below for easy cutting and pasting.

<pre>#!/bin/bash # watchProc: monitor a process and log start and stop times

sleepInterval=60logDir=/var/tmp

# ---- is proc name supplied? ----if [ "$1" = "" ]; then echo -n "watch?> " read procName else procName=$1 fi

# ---- determine if process(es) are running at invocation ----prevStatus=`pgrep $procName | wc -l | awk '{print $1}'` dt=`date`

# ---- if proc not currently running, ask user to confirm monitoring---- if [ "$prevStatus" = 0 ]; then yn=`ckyorn -p "$procName is notrunning now -- monitor anyway?" -d y` case $yn in [Yy]*) echo "$dt:starting monitor task for $procName" | tee -a $logDir/$procName.logecho "$dt: $procName is down" | tee -a $logDir/$procName.log ;; [Nn]*)echo ok exit ;;

* echo exiting exit ;; esac else echo "$dt: starting monitor taskfor $procName" | tee -a $logDir/$procName.log echo "$dt: $procName isrunning" | tee -a $logDir/$procName.log fi

# ---- continue monitoring at configured interval ---- while true dodt=`date` newStatus=`pgrep $procName | wc -l | awk '{print $1}'` if [$newStatus != $prevStatus ]; then case $newStatus in 0) echo "$dt:$procName is down" >> $logDir/$procName.log ;;

* echo "$dt: $procName is running" >> $logDir/$procName.log ;;esac prevStatus=$newStatus fi sleep $sleepInterval done </pre>

posted by Brahma at 2:57 PM 0 comments

Issue with join command

Page 189: Solaris Real Stuff

Re: Issue with join commandPosted By Don Rowland On Saturday, August 13, 2005 at 8:11 PM

Amit S

Try this:

file1 contents:file1 - line1file1 - line2file1 - line3file1 - line4

file2 contents:file2 - line1file2 - line2file2 - line3file2 - line4

Shell script append:

#! /bin/ksh### append line from file2 to end of line from file1#awk '{FRONT = $0getline BACK < "file2"print FRONT BACK}' file1

Output:

./appendfile1 - line1file2 - line1file1 - line2file2 - line2file1 - line3file2 - line3file1 - line4file2 - line4

No space between FRONT and BACK.

Don

posted by Brahma at 2:54 PM 0 comments

Application Server crash

Page 190: Solaris Real Stuff

ProblemMustGather for problems with a WebSphere Application Server crash on�the Solaris platform. Gathering this information before calling IBM�support will help familiarize you with the troubleshooting process andsave you time.SolutionThis document lists what is needed to begin troubleshooting anApplication Server crash on Sun Microsystems Solaris platform.�

If you have already contacted support, continue on to the component'sspecific MustGather information. Otherwise, click: MustGather: Readfirst for all WebSphere Application Server products.

Crash on Solaris specific MustGather information

1. Important: Before executing any of the following instructions,please make a backup copy of the core file. Perform the following foreach core file (the core file will be in the working directory, whichis /opt/WebSphere/AppServer/bin by default).

2. If running Solaris 8 or above,enter the following from a command-line:/usr/proc/bin/pstack [core]>pstack.out/usr/proc/bin/pmap [core]>pmap.out/usr/proc/bin/pldd [core]>pldd.out

where: [core] is the name of the core filepstack.out is changed for each core filepmap.out is changed for each core filepldd.out is changed for each core file

3. If not running Solaris 8 or above and dbx is installed on theSolaris machine, enter the following from a command-line:

./dbxtrace_sun.sh [executable] [core]>dbxtrace.out

where: [executable] is opt/WebSphere/AppServer/java/bin/sparc/native_threads/java[core] is the name of the core filedbxtrace.out is changed for each core file

Page 191: Solaris Real Stuff

4. If not running Solaris 8 or above and gdb is installed on theSolaris machine, enter the following from a command-line:

./gdbtrace.sh [executable] [core]>gdbtrace.out

where: [executable] is opt/WebSphere/AppServer/java/bin/sparc/native_threads/java[core] is the name of the core filegdbtrace.out is changed for each core file

5. In addition to the preceding information, capture the following:* For V6.0 release:o All files in<install_root>/profiles/<ProfileName>/logs/<ServerName> directory.o A copy of server.xml located in<install_root>/profiles/<ProfileName>/config/cells/<CellName>/nodes/<NodeName>/servers/<ServerName>director* For V5.0 and V5.1 releases:o Include all of the files from the<install_root>/logs/<ServerName>directory.o A copy of server.xml located at<install_root>/config/cells/<CellName>/nodes/<nodeName>/servers/<ServerName>directory* For V3.5 and V4.0 releases:o Include all of the files from the<install_root>/logs directory.o A copy of XMLExport for the server configuration* env > env.out* ulimit -a > ulimit.out* uname -a > uname.out* showrev -p > showrev.out* pkginfo -l > pkginfo.out* /var/adm/messages* All hs_err_pid*.log files* Note: For all versions, If you have configured theApplication Server to write logs into a different location, send themaccordingly.6. Follow instructions tosend diagnostic information to IBM support.

For a listing of all technotes, downloads, and educational materialsspecific to the Crash component, search the WebSphere ApplicationServer support site.

Page 192: Solaris Real Stuff

Related informationTen Steps to Getting Support

posted by Brahma at 2:54 PM 0 comments

Solaris application crashes

When a Solaris application crashes, it usually produces a core file,which is a disk copy of the application's memory at the time of thecrash.

One way to generate a traceback is to use a debugger such as dbx withthe core file:

% dbx /path/executable core(dbx) where > traceback.t

Q13: A KDE application crashed and I want to file a bug report athttp://bugs.kde.org, but the backtrace in the KDE Crash Manager is"useless". What can I do?

A13 DrKonqi . To get backtraces from crashing applications onSolaris you need to run them in gdb or use Dtrace (Solaris 10).Additionally most of the Solaris binary packages are not compiled withdebugging information enabled, so you will probably have to recompilethem first:

- To enable debugging support in a KDE port, build it withWANT_KDE_DEBUG defined. You can set it on the commandline (example:make -DWANT_KDE_DEBUG && make install or put it into /etc/make.conf(WANT_KDE_DEBUG=YES. A similar switch exists for enabling debuggingsymbols in QT: WANT_QT_DEBUG. Note that those switches ONLY work inthe ports of the main KDE module ports like kdebase, kdemultimedia,arts, etc. They will not work in ports of 3rd party KDE apps like forexample kmldonkey or kbear.- To run an application in gdb, do: gdb /path/to/application. Thiswill give you a prompt:(gdb)You can run the program now by typing run:(gdb) runThen you make the program crash. You will be returned to theprompt. Type bt:(gdb) btand you will get the backtrace you want. Some KDE applicationsspawn a new process after being launched and thus don't stay undergdb's control - an example for such an application is kmail. To debug

Page 193: Solaris Real Stuff

those, you need to type(gdb) set args --noforkprior to run on the gdb prompt.

posted by Brahma at 2:53 PM 0 comments

Solaris GUI not displaying

Solaris GUI not displaying

Hi. I'm a newbie to Solaris so please be gentile!! <grin>

Have a problem getting the KDE or CDE (?)... GUI running.

I have a SunFire V880 with Solaris 9 and Oracle 10G currently running.The box and apps were running fine and the GUI displayed as well. The Oracle DBA wanted to "tune" memory so he made a change to aparameter in the /etc/system file. Upon completeing this change hegracefully shutdown the box (init 6). The box failed to boot up andthe display would not show at all, dark screen. So we thought itmight be a problem in the KVM and therefore direct connected a Sunmonitor to the box. This showed us a command line and the OScomplaining about a memory error.

I arrived on site and initiated a boot from the solaris 9 cd, mountedthe partition and edited the /etc/system file restoring the originalparameters.

Rebooted the box and (it took a Looonnnggg while 10+ minutes) the OSstarted, Oracle started, but the X server failed.

So my customers are happy that the dB is running but I can't get X to start.

Help!!!

Thanks in advance ---Kenny

Kenny,

What happens if you do the following at the command prompt:

/usr/openwin/bin/xinit

Do you have /usr/openwin/bin as part of your PATH environment variable?

Mike

Page 194: Solaris Real Stuff

Mike,

Hello, When I enter "xinit" I receive the following:

X: syntax error at line 1: `(' unexpected/xinit: server error

Thanks for helping!

Kenny

Going that route, you could try entering it into your PATH statement.PATH=$PATH:/usr/openwin/bin

I've recently been going thru the same stuff and found that if I enteredthe following, as long as I was running and configured for 'X', that myGUI came up.Simply type /usr/dt/bin/Xsession &

Good luck!,

Chris Barbot

Kenny,

Are you trying to start X from the console or a remote machine? If youare using a remote machine, are you using "telnet"? If you're on aremote machine make sure you do the following:

1. Before logging into the server, run the following command; "xhost<server hostname>"2. After logging into the server, run the following command; "exportDISPLAY=<hostname of your remote machine>"

I run the k-shell so your syntax for #2 will be different if you use cshor sh. The error that is returned when you run xinit is indicative of ashell error, not a program error. Did you run "xinit" or"/usr/openwin/bin/xinit"? Determine if you are running the proper"xinit" command by using, "which xinit". You can also use, "file/usr/openwin/bin/xinit" to determine if the "xinit" command you arecalling is shell script or an executable.

Mike

posted by Brahma at 2:52 PM 0 comments

Page 195: Solaris Real Stuff

here command in unix shell script function

here command in unix shell script function

Hello All,

I use here command to whenever I need to connect to Oracle throughkorn shell script say e.g.

sqlplus user/passwd@database << EOFSELECT * from tab;EOF

But I am not able to this from function written in korn shell script.

e.g.

raise_error() {

## some codesqlplus user/passwd@database << EOFSELECT * from tab;EOF

## some code}

While I try to run this shell script, it gives me syntax error:Message: syntax error at line 66, << unmatached.

Please can anybody help me how I can connect to Oracle from unix shellscript function.

Thanks,

Nice

osted By Reggie Beavers On Monday, August 22, 2005 at 10:53 AM

Hi Nice,

Try it this way:

echo 'select * from tab;' | sqlplus user/pass@db /nolog

Page 196: Solaris Real Stuff

Regards,--Reggie Beavers

posted by Brahma at 2:51 PM 0 comments

What exactly is "rss" from ps and prstat ?

What exactly is "rss" from ps and prstat ?

Got a quick query - what exactly is the "rss" or resident set size of aprocess, as reported by "ps" ? My understanding from the man page isthat it is the size of the program actually in memory, as opposed toit's size in virtual memory.

However, this doesn't seem to be the case. If I run a quick script toadd up the total amount of memory used by a 'httpd' process :

# ps -e -o comm,rss | grep httpd | awk '{sum+=$2} END {print sum}'11354832

So it appears that httpd processes (Apache) are taking up 11354832Kilobytes, or nearly 11 Gigabytes. This is also confirmed by the outputfrom prstat, which also shows 11G in the RSS field.

Which would be a nifty trick, given that the box only has 4Gb...

I assume I've totally misunderstood what RSS refers to; if anyone couldclarify this for me, it would be greatly appreciated!

Thanks,

> Which would be a nifty trick, given that the box only has 4Gb...

Thanks to shared libraries and fork()s without subsequent exec()s,a significant part of the memory is shared.

AAh, that certainly explains it. Thanks.

Could someone explain what the best method is to obtain the figure I'mafter then? Ideally, I'd like to find out the total amount used by eachprocess, excluding shared pages, and then how much memory is sharedbetween them. These figures could then be added up to show the "real"amount of memory used by the given set of processes.

Page 197: Solaris Real Stuff

pmap -x looks like it may hold the answer, but I'm still not entirelyclear what I need to look for. Looking at Rich Teer's "Solaris SystemsProgramming", it appears that I'd want to add up the amount used by thestack and heap of each process. Is this correct ?

T> stack and heap of each process. Is this correct ?

I'm afraid that things are more difficult thanks to copy-on-write whichis in effect after fork(). That means that some of the pages of a stackor heap can still be shared while other pages have already been copied.And there are many other mappings that require swap space. Takeinitialized and uninitialized variables for example or storage areasthat have been allocated through mmap(2) with MAP_ANON added to the flags.

Andreas.

posted by Brahma at 2:49 PM 0 comments

Extremely poor network performance after OBP

Subject: Re: [SunHELP] Extremely poor network performance after OBPupgrade(and os reinstal)

It may not be the Sun at all. If the other end is a Linux machine,there is a bug in the e1000 driver in the 2.6 kernel distributed withRHEL4.1 which causes nice Intel 1000 mbit cards to perform worse than 10mbit cards in certain cases with tcp. Work around is here:

http://dsd.lbl.gov/TCP-tuning/linux.html

The difference is rather dramatic:

# ethtool -K eth1 tso on# iperf -c 192.168.9.10...[ 3] 0.0-21.4 sec 4.77 MBytes 1.87 Mbits/sec

posted by Brahma at 2:48 PM 0 comments

benchmark with 'time mkfile 650M testfile'

Hi,

we use an old E450 (2 X UltraSPARC-II 400MHz) with Sun Solaris 8installed - mainly as fileserver.

Page 198: Solaris Real Stuff

The two internal disk are mirrored with SDS and used for /, /usr, /var,/local, /opt...

Attached to the E450 is a StorEDGE A1000 set up as RAID5 and used asseparate mount point /projekte.

I use UFS logging for all disks.

A quick benchmark with 'time mkfile 650M testfile' showed the followingresults.

* RAID5 device /projekte$ time mkfile 650M testfile

real 0m31.310suser 0m0.230ssys 0m10.710s

15MB/s might be ok for RAID5.

* mirrored device /local$ time mkfile 650M testfile

real 4m49.407suser 0m0.180ssys 0m11.940s

That's only ~2MB/s!

How can I debug the problem further? Any hints?

Thanks, Ralf

hi

i don't have an answer, but's really poor.here the result of my old e450 - not mirrored fs.as a value to compare.

# time mkfile 650M testfile

real 0m20.32suser 0m0.23ssys 0m10.33s

Page 199: Solaris Real Stuff

and this is the result of a mirrored fs on v880but the volume manager is from veritas.

real 0m16.82suser 0m0.13ssys 0m5.77s

best regardshans

posted by Brahma at 2:48 PM 0 comments

Correct way to find swap size/usage

Correct way to find swap size/usage

Hello,Which is the best way to find out the total swap space and used swapspace on a Solaris machine?There are 2 or 3 options, all are giving different values.1. sar -r2. swap -l3. swap -s4. df -k swap

> Vadi

kstat unix:0:vminfo:*

Reply

What do you mean by 'swap'? There's at least two common definitionsdepending on which utility you use.

> There are 2 or 3 options, all are giving different values.> 1. sar -r> 2. swap -l> 3. swap -s> 4. df -k swap

Don't forget 'top' and 'vmstat'.

Did you read the man pages for them and see what values they aredisplaying? They seem rather clear to me....

Page 200: Solaris Real Stuff

I'm actually somewhat annoyed by what 'df -k swap' chooses to display.(I think it should return nothing). 'df' is a utility to showfilesystem information. Using it to understand overall system swapconfiguration is not recommended.

Is it ok if we use the output of swap -l - this gives the disk/swapfileusageswap -s gives the swap usage including some portion of physical memory.

vmstat gives only available swap, not total swap.top gives 2 fields : in use & free can we add these?

basically some person will just have to run a command and copy thevalues and send them for reporting (as a %) - so if there is adirect,simple way to get the values...Thanks!

R> direct,simple way to get the values...

swap -s | nawk '{gsub("k$","",$9);gsub("k$","",$11);used=$9;total=$9+$11;printf("VM\nTotal: %5d MB\n",total/1024);printf("Used : %5d MB ",used/1024);printf("(%d%%)\n",(used*100)/total);

- Hide quoted text -- Show quoted text -}'

posted by Brahma at 2:47 PM 0 comments

time and /bin/time

time and /bin/time differ primarily in that time is built into the Cshell. Therefore, it cannot be used in Bourne shell scripts or inmakefiles. It also cannot be used if you prefer the Bourne shell (sh)./bin/time is an independent executable file and therefore can be usedin any situation. To get a simple program timing, enter either time or/bin/time, followed by the command you would normally use to executethe program. For example, to time a program named analyze, enter thefollowing command:

Page 201: Solaris Real Stuff

% time analyze inputdata outputfile9.0u 6.7s 0:30 18% 23+24k 285+148io 625pf+0w

This indicates that the program spent 9.0 seconds on behalf of theuser (user time), 6.7 seconds on behalf of the system (system time, ortime spent executing UNIX kernel routines on the user's behalf), and atotal of 30 seconds elapsed time. Elapsed time is the wall clock timefrom the moment you enter the command until it terminates, includingtime spent waiting for other users, I/O time, etc.

posted by Brahma at 2:46 PM 0 comments

Average Command Runtimes with runtime

39.4 Average Command Runtimes with runtimeruntimeThe time command (39.2) will time a single run of a command - but theresults can vary from run to run. The runtime script runs a commandthe number of times you specify, then averages the results. Forexample:

% runtime -5 getdata 0.5 outfile...wait a while...runtime summary - 5 runs of% getdata 0.5 outfile(working directory = /users/jerry/.src/getdata)

First run started at: Thu Mar 19 09:33:58 EST 1992Last run finished at: Thu Mar 19 09:36:41 EST 1992------------------------

RUN # ***INDIVIDUAL RESULTS***1 1.0u 7.4s 1:06 12% 0+108k 0+0io 0pf+0w2 0.2u 0.8s 0:05 16% 0+128k 0+0io 0pf+0w3 0.2u 1.3s 0:11 13% 0+116k 0+0io 0pf+0w4 0.4u 2.7s 0:25 12% 0+108k 0+0io 0pf+0w5 0.9u 5.9s 0:53 12% 0+108k 0+0io 0pf+0w

AVERAGES:0.54u 3.62s 0:32 0+113k 0+0io 0pf+0w

It's good for testing different versions of a program to find thefastest (or slowest!). If you're writing a program that will run alot, shaving 10% or 20% off its time can be worth the work.

Page 202: Solaris Real Stuff

Note that the command you run can't have any redirection in it; that'sbecause runtime does some redirection of its own. You can redirect theoutput of runtime into a log file though, and run the whole mess inthe background. For example:

% runtime -5 getdata 0.5 outfile > runtime.out &[1] 12233

The summary will go to the runtime.out file.

posted by Brahma at 2:45 PM 0 comments

Know When to Be "nice" to OTher Users...and When Not to

39.9 Know When to Be "nice" to OTher Users...and When Not toniceThe nice command modifies the scheduling priority of time-sharingprocesses (for BSD and pre-V.4 releases of System V, all processes).The GNU version is on the CD-ROM (the disc's install system will onlyinstall nice if your system has the appropriate facilities).

If you're not familiar with UNIX, you will find its definition ofpriority confusing - it's the opposite of what you would expect. Aprocess with a high nice number runs at low priority, gettingrelatively little of the processor's attention; similarly, jobs with alow nice number run at high priority. This is why the nice number isusually called niceness: a job with a lot of niceness is very kind tothe other users of your system (i.e., it runs at low priority), whilea job with little niceness will hog the CPU. The term "niceness" isawkward, like the priority system itself. Unfortunately, it's the onlyterm that is both accurate (nice numbers are used to computepriorities but are not the priorities themselves) and avoids horriblecircumlocutions ("increasing the priority means lowering thepriority...").

Many supposedly experienced users claim that nice has virtually noeffect. Don't listen to them. As a general rule, reducing the priorityof an I/O-bound job (a job that's waiting for I/O a lot of the time)won't change things very much. The system rewards jobs that spend mostof their time waiting for I/O by increasing their priority. Butreducing the priority of a CPU-bound process can have a significanteffect. Compilations, batch typesetting programs (troff, TeX, etc.),applications that do a lot of math, and similar programs are goodcandidates for nice. On a moderately loaded system, I have found thatnice typically makes a CPU-intensive job roughly 30 percent slower andconsequently frees that much time for higher priority jobs. You can

Page 203: Solaris Real Stuff

often significantly improve keyboard response by running CPU-intensivejobs at low priority.

Note that System V Release 4 has a much more complex priority system,including real-time priorities. Priorities are managed with thepriocntl command. The older nice command is available forcompatibility. Other UNIX implementations (including HP andConcurrent) support real-time scheduling. These implementations havetheir own tools for managing the scheduler.

The nice command sets a job's niceness, which is used to compute itspriority. It may be one of the most non-uniform commands in theuniverse. There are four versions, each slightly different from theothers. BSD UNIX has one nice that is built into the C shell, andanother standalone version can be used by other shells. System V alsohas one nice that is built into the C shell and a separate standaloneversion.

Under BSD UNIX, you must also know about the renice(8) command(39.11); this lets you change the niceness of a job after it isrunning. Under System V, you can't modify a job's niceness once it hasstarted, so there is no equivalent.

NOTE: Think carefully before you nice an interactive job like atext editor. See article 39.10.

We'll tackle the different variations of nice in order.39.9.1 BSD C Shell nice

Under BSD UNIX, nice numbers run from -20 to 20. The -20 designationcorresponds to the highest priority; 20 corresponds to the lowest. Bydefault, UNIX assigns the nice number 0 to user-executed jobs. Thelowest nice numbers (-20 to -17) are unofficially reserved for systemprocesses. Assigning a user's job to these nice numbers can causeproblems. Users can always request a higher nice number (i.e., a lowerpriority) for their jobs. Only the superuser (1.24) can raise a job'spriority.

To submit a job at a greater niceness, precede it with the modifiernice. For example, the command:

% nice awk -f proc.awk datafile > awk.out

runs an awk command at low priority. By default, csh version of nicewill submit this job with a nice level of 4. To submit a job with anarbitrary nice number, use nice one of these ways:

Page 204: Solaris Real Stuff

% nice +n command% nice -n command

where n is an integer between 0 and 20. The +n designation requests apositive nice number (low priority); -n request a negative nicenumber. Only a superuser may request a negative nice number.39.9.2 BSD Standalone nice

The standalone version of nice differs from C shell nice in that it isa separate program, not a command built in to the C shell. You cantherefore use the standalone version in any situation: withinmakefiles (28.13), when you are running the Bourne shell, etc. Theprinciples are the same. nice numbers run from -20 to 20, with thedefault being zero. Only the syntax has been changed to confuse you.For the standalone version, -n requests a positive nice number (lowerpriority) and --n requests a negative nice number (higherpriority-superuser only). Consider these commands:

$ nice -6 awk -f proc.awk datafile > awk.out# nice --6 awk -f proc.awk datafile > awk.out

The first command runs awk with a high nice number (i.e., 6). Thesecond command, which can be issued only by a superuser, runs awk witha low nice number (i.e., -6). If no level is specified, the defaultargument is -10.39.9.3 System V C Shell nice

System V takes a slightly different view of nice numbers. nice levelsrun from 0 to 39; the default is 20. The numbers are different buttheir meanings are the same: 39 corresponds to the lowest possiblepriority, and 0 is the highest. A few System V implementations supportreal-time submission via nice. Jobs submitted by root with extremelylow nice numbers (-20 or below) allegedly get all of the CPU's time.Systems on which this works properly are very rare and usuallyadvertise support for real-time processing. In any case, running jobsthis way will destroy multiuser performance. This feature iscompletely different from real-time priorities in System V Release 4.

With these exceptions, the C shell version of nice is the same as itsBSD cousin. To submit a job at a low priority, use the command:

% nice command

This increases the command's niceness by the default amount (4, thesame as BSD UNIX); command will run at nice level 24. To run a job atan arbitrary priority, use one of the following commands:

Page 205: Solaris Real Stuff

% nice +n command% nice -n command

where n is an integer between 0 and 19. The +n entry requests a highernice level (a decreased priority), while -n requests a lower nicelevel (a higher priority). Again, this is similar to BSD UNIX, withone important difference: n is now relative to the default nice level.That is, the command:

% nice +6 awk -f proc.awk datafile > awk.out

runs awk at nice level 26.39.9.4 System V Standalone nice

Once again, the standalone version of nice is useful if you arewriting makefiles or shell scripts or if you use the Bourne shell asyour interactive shell. It is similar to the C shell version, withthese differences:

*

With no arguments, standalone nice increases the nice number by10 instead of by 4; this is a significantly greater reduction in theprogram's priority.*

With the argument -n, nice increases the nice number by n(reducing priority).*

With the argument - -n, nice decreases the nice number by n(increasing priority; superuser only).

Consider these commands:

$ nice -6 awk -f proc.awk datafile > awk.out# nice --6 awk -f proc.awk datafile > awk.out

The first command runs awk at a higher nice level (i.e., 26, whichcorresponds to a lower priority). The second command, which can begiven only by the superuser, runs awk at a lower nice level (i.e.,14).

- ML from O'Reilly & Associates' System Performance Tuning, Chapter 3

39.10 A nice Gotcha

Page 206: Solaris Real Stuff

NOTE: It's NOT a good idea to nice a foreground job (12.1). If thesystem gets busy, your terminal could "freeze" waiting to get enoughCPU time to do something. You may not even be able to kill (38.9) anice'd job on a very busy system because the CPU may never give theprocess enough CPU time to recognize the signal waiting for it! And,of course, don't nice an interactive program like a text editor unlessyou like to wait... :-)

- JP

posted by Brahma at 2:45 PM 0 comments

What Makes Your Computer Slow? How Do You Fix It?

39.12 What Makes Your Computer Slow? How Do You Fix It?

Article 39.5 discussed the various components that make up a user'sperception of system performance. There is another equally importantapproach to this issue: the computer's view of performance. All systemperformance issues are basically resource contention issues. In anycomputer system, there are three fundamental resources: the CPU,memory, and the I/O subsystem (e.g., disks and networks). From thisstandpoint, performance tuning means ensuring that every user gets afair share of available resources.

Each resource has its own particular set of problems. Resourceproblems are complicated because all resources interact with oneanother. Your best approach is to consider carefully what each systemresource does: CPU, I/O, and memory. To get you started, here's aquick summary of each system resource and the problems it can have.39.12.1 The CPU

On any time-sharing system, even single-user time-sharing systems(such as UNIX on a personal computer), many programs want to use theCPU at the same time. Under most circumstances the UNIX kernel is ableto allocate the CPU fairly; however, each process (or program)requires a certain number of CPU cycles to execute and there are onlyso many cycles in a day. At some point the CPU just can't get all thework done.

There are a few ways to measure CPU contention. The simplest is theUNIX load average, reported by the BSD uptime (39.7) command. UnderSystem V, sar -q provides the same sort of information. The loadaverage tries to measure the number of active processes at any time (aprocess is a single stream of instructions). As a measure of CPU

Page 207: Solaris Real Stuff

utilization, the load average is simplistic, poorly defined, but farfrom useless.

Before you blame the CPU for your performance problems, think a bitabout what we don't mean by CPU contention. We don't mean that thesystem is short of memory or that it can't do I/O fast enough. Eitherof these situations can make your system appear very slow. But the CPUmay be spending most of its time idle; therefore, you can't just lookat the load average and decide that you need a faster processor. Yourprograms won't run a bit faster. Before you understand your system,you also need to find out what your memory and I/O subsystems aredoing. Users often point their fingers at the CPU, but I would bewilling to bet that in most situations memory and I/O are equally (ifnot more) to blame.

Given that you are short of CPU cycles, you have three basic alternatives:

*

You can get users to run jobs at night or at other low-usagetimes -suring the computer is doing useful work 24 hours a day) withbatch or at (40.1).*

You can prevent your system from doing unnecessary work.*

You can get users to run their big jobs at lower priority (39.9).

If none of these options is viable, you may need to upgrade your system.39.12.2 The Memory Subsystem

Memory contention arises when the memory requirements of the activeprocesses exceed the physical memory available on the system; at thispoint, the system is out of memory. To handle this lack of memorywithout crashing the system or killing processes, the system startspaging: moving portions of active processes to disk in order toreclaim physical memory. At this point, performance decreasesdramatically. Paging is distinguished from swapping, which meansmoving entire processes to disk and reclaiming their space. Paging andswapping indicate that the system can't provide enough memory for theprocesses that are currently running, although under somecircumstances swapping can be a part of normal housekeeping. Under BSDUNIX, tools such as vmstat and pstat show whether the system ispaging; ps can report the memory requirements of each process. The

Page 208: Solaris Real Stuff

System V utility sar provides information about virtually all aspectsof memory performance.

To prevent paging, you must either make more memory available ordecrease the extent to which jobs compete. To do this, you can tunesystem parameters, which is beyond the scope of this book (seeO'Reilly & Associates' System Performance Tuning by Mike Loukides forhelp). You can also terminate (38.10) the jobs with the largest memoryrequirements. If your system has a lot of memory, the kernel's memoryrequirements will be relatively small; the typical antagonists arevery large application programs.39.12.3 The I/O Subsystem

The I/O subsystem is a common source of resource contention problems.A finite amount of I/O bandwidth must be shared by all the programs(including the UNIX kernel) that currently run. The system's I/O busescan transfer only so many megabytes per second; individual devices areeven more limited. Each kind of device has its own peculiarities and,therefore, its own problems. Unfortunately, UNIX has poor tools foranalyzing the I/O subsystem. Under BSD UNIX, iostat can give youinformation about the transfer rates for each disk drive; ps andvmstat can give some information about how many processes are blockedwaiting for I/O; and netstat and nfsstat report various networkstatistics. Under System V, sar can provide voluminous informationabout I/O efficiency, and sadp (V.4) can give detailed informationabout disk access patterns. However, there is no standard tool tomeasure the I/O subsystem's response to a heavy load.

The disk and network subsystems are particularly important to overallperformance. Disk bandwidth issues have two general forms: maximizingper-process transfer rates and maximizing aggregate transfer rates.The per-process transfer rate is the rate at which a single programcan read or write data. The aggregate transfer rate is the maximumtotal bandwidth that the system can provide to all programs that run.

Network I/O problems have two basic forms: a network can be overloadedor a network can lose data integrity. When a network is overloaded,the amount of data that needs to be transferred across the network isgreater than the network's capacity; therefore, the actual transferrate for any task is relatively slow. Network load problems canusually be solved by changing the network's configuration. Integrityproblems occur when the network is faulty and intermittently transfersdata incorrectly. In order to deliver correct data to the applicationsusing the network, the network protocols may have to transmit eachblock of data many times. Consequently, programs using the networkwill run very slowly. The only way to solve a data integrity problem

Page 209: Solaris Real Stuff

is to isolate the faulty part of the network and replace it.39.12.4 User Communities

So far we have discussed the different factors that contribute tooverall system performance. But we have ignored one of the mostimportant factors: the users who submit the jobs.

In talking about the relationship between users and performance, it iseasy to start seeing users as problems: the creatures who keep yoursystem from running the way it ought to. Nothing is further from thetruth. Computers are tools: they exist to help users do their work andnot vice versa.

Limitations on memory requirements, file size, job priorities, etc.,are effective only when everyone cooperates. Likewise, you can't forcepeople to submit their jobs to a batch queue (40.6). Most people willcooperate when they understand a problem and what they can do to solveit. Most people will resist a solution that is imposed from above,that they don't understand, or that seems to get in the way of theirwork.

The nature of your system's users has a big effect on your system'sperformance. We can divide users into several classes:

*

Users who run a large number of relatively small jobs: forexample, users who spend most of their time editing or running UNIXutilities.*

Users who run a small number of relatively large jobs: forexample, users who run large simulation programs with huge data files.*

Users who run a small number of CPU-intensive jobs that don'trequire a lot of I/O but do require a lot of memory and CPU time.Program developers fall into this category. Compilers tend to be largeprograms that build large data structures and can be a source ofmemory contention problems.

All three groups can cause problems. Several dozen users running grepand accessing remote filesystems can be as bad for overall performanceas a few users accessing gigabyte files. However, the types ofproblems these groups cause are not the same. For example, setting upa "striped filesystem" will help disk performance for large, I/O-bound

Page 210: Solaris Real Stuff

jobs but won't help (and may hurt) users who run many small jobs.Setting up batch queues will help reduce contention among large jobs,which can often be run overnight, but it won't help the system if itsproblems arise from users typing at their text editors and readingtheir mail.

Modern systems with network facilities (1.33) complicate the pictureeven more. In addition to knowing what kinds of work users do, youalso need to know what kind of equipment they use: a standard terminalover an RS-232 line, an X terminal over Ethernet, or a disklessworkstation? The X Window System requires a lot of memory and puts aheavy load on the network. Likewise, diskless workstations place aload on the network. Similarly, do users access local files or remotefiles via NFS or RFS?

posted by Brahma at 2:44 PM 0 comments

sed to clean script output

#!/bin/sh# Public domain.

# Put CTRL-M in $m and CTRL-H in $b.# Change \010 to \177 if you use DEL for erasing.eval `echo m=M b=H | tr 'MH' '\015\010'`

exec sed "s/$m\$//:xs/[^$b]$b//t x" $*

You can also hack the sed script in script.tidy to delete some of yourterminal's escape sequences (5.8); article 41.11 explains how to findthese sequences. (A really automated script.tidy would read yourtermcap or terminfo entry and look for all those escape sequences inthe script file.)

- JP

posted by Brahma at 2:44 PM 0 comments

Making an Arbitrary-Size File for Testing

51.11 Making an Arbitrary-Size File for Testing

Page 211: Solaris Real Stuff

The yes command (23.4) outputs text over and over. If you need a fileof some size for testing, make it with yes and head (25.20). Forexample, to make a file 100k (102,400) characters long, with 12,8008-character lines (7 digits and a newline), type:

% yes 1234567 | head -12800 > 100k-file

NOTE: On some UNIX systems, that command may "hang" and need to bekilled with CTRL-c - because head keeps reading input from the pipe.If it hangs on your system, replace head -12800 with sed 12800q.

- JIK, JP

posted by Brahma at 2:43 PM 0 comments

route in Sun 10

Subject: SUMMARY: route in Sun 10

Thanks to all those who replied..

To make a permanant route one can use the follwing.1. use file /etc/init.d/inetsvc OR make a script in /etc/rc2.d with the route content like:

#!/sbin/sh#S67static-route file# static route table for echo " Setting static routes "route add 172.16.1.0 -netmask 255.255.255.0 10.7.0.240 > /dev/consoleecho " Done with setting static routes "##Done!

Some people also have suggested /etc/default/inetinit to put the route, but not tested.

For second question, one should usepkgchk -l -p /path/to/file

Or alternatively

grep FILENAME /var/sadm/install/contents

posted by Brahma at 2:43 PM 0 comments

Overview of Network Performance

Page 212: Solaris Real Stuff

Overview of Network Performance

¡Ü Congested or Collision - resend

Network has the bankwidth, and so can only transmit a certainamount of data

¡Ü EtherNet

10Mbit/sec(14,400 packes/sec)

Size of 1 packet - 64 bytes (about 14,400 packets/sec)

Maximum packet size : 1518 bytes

Inter packet gab : 9.6 micro-second

30-40% utilization because of collision contention

¡Ü Latency

Not important as much as disk

Must consider that remote system has resources including disk

¡Ü NFS

UDP : Common protocol in use, being part of TCP/IP and allowsfast network throughput with little overhead.

Logical packet size : 9Kbytes

On ethernet : 6 * 1518 bytes

After collision, have to resend ALL serveral ethernet packets

¡Ü Slower remote server

The remote server is CPU bound

¡Ü Network Monitoring Tool

nfsstat

netstat

Page 213: Solaris Real Stuff

snoop

ping

spray

ping command

How to use ping

#> ping

Send a packet to a host on the network.

-s : send one packet per second

How to read ping -s output

¢º 2 single SPARCstations on a quiet EtherNet always respond with lessthan 1 millisecond

Example of ping

#> ping -s hostPING host: 56 data bytes64 bytes from host (1.1.1.1): icmp_seq=0. time7. ms64 bytes from host (1.1.1.1): icmp_seq=0. time7. ms

----host PING Statistics----5 packets transmitted, 5 packets received, 0% packet lossround-trip (ms) min/avg/max = 1/2/7

spray command

How to use spray

#> spray

Send a one-way stream of packets to a host

Reports How may were received and the transfer RATE.

-c : count(number) of packets

-d : specifies the delay, in microseconds

Page 214: Solaris Real Stuff

Default: 9.6 microsecond

-l : specifies the Length(size) of the packet

How to read spray output

¢º If you use -d option, if there are many packet dropped the packet,

=> Check Hardware such as loose cables or missing termination

=> Check the possible a congested network

=> Use the netstat command to get more information

Example of spray

#> spray -d 20 -c 100 -l 2048 host

sending 100 packets of length 2048 to host ...no packets dropped by host560 packets/sec, 1147576 bytes/sec

netstat -i command

How to use netstat -i

#> netstat -i 5

errs : the number of errors

packets : the number of packets

colls : the number of collisions

* Collision percentage rate = colls/output packets * 100

How to read netstat data

¢º collision percentage > 5% (one system)

=> Checking the network interface and cabling

¢º collision percentage > 5% (all system)

=> The network is congested

Page 215: Solaris Real Stuff

¢º errs field has data

=> Suspect BAD hardware generating illegal sized packets

=> Check Repeated Network

Example of netstat

#> netstat -i 5input le0 output input (Total) outputpackets errs packets errs colls packets errs packets errs colls71853 1 27270 8 4839 72526 1 27943 8 48397 0 0 0 0 7 0 0 0 0 14 0 0 0 0 14 0 0 0 0

snoop command

How to use snoop

#> snoop

Capture packets from the network

Display their contents

Example of snoop

#> snoop host1#> snoop -o filename host1 host2#> snoop -i filename -t r | more#> snoop -i filename -p99,108#> snoop -i filename -v -p101#> snoop -i filename rpc nfs and host1 and host2

nfsstat -c command

How to use nfsstat -c

#> nfsstat -c

Display a summary of servers and client statistics

Can be used to IDENTIFY NFS problems

retrans : Number of remote procedure calls(RPCs) that were retransmitted

Page 216: Solaris Real Stuff

badxids : Number of times that a duplicat acknowledgement wasreceived for a single NFS request

timeout : Number of calls that timed out

readlink : Number of reads to symbolic links

How to read nfsstat data

¢º % of retrans of calls > 5% : maybe network problem

=> Looking for network congestion

=> Looking for overloaded servers

=> Check ethernet interface

¢º high badxid, as well as timeout : remote server slow

=> Increase the time-out period

#> mount host:/home /home rw,soft,timeout=15 0 0

¢º % of readlink of the calls > 10% : too many symbolic links

Example of nfsstat

#> nfsstat -c

Client rpc:calls badcalls retrans badxid timeout wait newcred timers13185 0 8 0 8 0 0 50

Client nfs:calls badcalls nclget nclcreate13147 0 13147 0null getattr setattr root lookup readlink read0 0% 794 6% 10 0% 0 0% 2141 16% 2720 21% 6283 48%wrcache write create remove rename link symlink0 0% 581 4% 33 0% 29 0% 4 0% 0 0% 0 0%mkdir rmdir readdir statf0 0% 0 0% 539 4% 13 0%

Network Solutions

1st. Consider adding the Prestoserve NFS Write accelerator.

Page 217: Solaris Real Stuff

=> Write % from nfsstat -s > 15%, consider installing it.

2nd. Subneting

=> If your network is congested, consider subnetting.

=> That is collision rate > 5%, subnetting.

3rd. Install the bridge

=> If your network is congested and physical segmentation is NOT possible.

=> Isolate physical segments of a busy network.

4st. Install the local disk into diskless machines.

posted by Brahma at 2:42 PM 0 comments

last login

The third (preferred) method is to build or acquire a tool that willread and interpret the contents of the wtmpx file and display loginrecords along with the year in which each login occurred. For example,you could locate source code for the last command and modify it todisplay the year along with the other date information. Alternately,you could download a copy of Ed Cashin's utxreader.c, compile the codeon your system and produce output such as these two entries:

user=kendal:fmttime=20030317-133121:host=geo.lox.org:init_id=ts/2pts/2:device=pts/2:pid=29104:proc_type=7:term_status=0:exit_status=0:user=stingray:fmttime=20030317-133912:host=world.std.com:init_id=t200pts/4:device=pts/4:pid=27102:proc_type=8:term_status=0:exit_status=0:

The date/time information in these records could benefit from a littlereformatting, but it's fairly clear. The string "20030317-133121"refers to 03/17/2003 at 13:31:21. Notice that the older records appearfirst, unlike the last command's output which is reversed so that themore relevant entries appear first.

To track down a particular user's logins using utxreader, you wouldhave to pipe the program's output to a grep command like this:

# ./utxreader /var/adm/wtmpx | grep sbob

To look at his most recent logins, you could also pipe the output totail: # ./utxreader /var/adm/wtmpx | grep sbob | tail -3

Page 218: Solaris Real Stuff

Source for utxreader.c is available at http://noserose.net/e/code/.

Another option is to build a script in Perl. This could be a verytricky process. After all, information contained in wtmpx records isnot stored in string format. Fortunately, however, details on how toread and parse this file using Perl are included in O'Reilly's "Perlfor System Administration" (David N. Blank-Edelman, 2000). In fact, asample chapter from this excellent book (also referred to as "theotter book") is available at this URL:

http://www.oreilly.com/catalog/perlsysadm/chapter/ch09.html

"Stealing" Great Perl Code

The sample chapter from the otter book just happens to include codefor extracting information from the wtmpx file and displaying systemreboots (look under the heading "Stream Read-Count") and explains howthe unpack command can be used to read and parse wtmpx records.Basically, the script defines a template which it then provides to theunpack command so that each field in a wtmpx record can be extractedand assigned to a separate variable.

I found that the script could be easily modified to print user loginsinstead of reboots. To do this, I changed these two lines:

if ($ut_line eq "system boot"){print "rebooted ".scalar localtime($tv_sec)."\n";

to these:if ($ut_user eq "$username"){ print "$username ".scalar localtime($tv_sec)."\n";

Not much of a challenge, huh?

I also added code to prompt for, read and chomp (trim the newlinecharacter from) the username:

print "username> ";$username=<STDIN>; chomp $username;

Output from the unmodified sample code looks like this: boson>./showreboots rebooted Tue Aug 19 20:39:28 2003 rebooted Sat Nov 808:52:30 2003 rebooted Thu Mar 11 19:08:34 2004 rebooted Wed Dec 115:24:46 2004 rebooted Thu Feb 10 05:22:01 2005

Notice that the date includes the year.

Page 219: Solaris Real Stuff

The showlogins script that I based on this script prompts for ausername and displays similarly formatted information:

boson> ./showloginsusername> jdoe jdoe Wed Apr 23 15:23:05 2003 jdoe Wed Apr 23 15:23:212003 jdoe Fri Apr 25 09:42:10 2003 jdoe Fri Apr 25 12:07:07 2003 jdoeFri Apr 25 12:16:35 2003

Again, notice that the output is not reversed but, instead, appears inthe order in which the records appear in the /var/adm/wtmpx file. Thismeans that the most recent records will appear last. The showloginsscript could be modified to look for the username on the command line,making it more amenable for use within other scripts. You might, forexample, want to display the most recent login activity in a scriptthat you use to disable an account.

You could use lines like these to ensure that a username has been supplied:

# verify that a username has been provided( 0 == $#ARGV ) or die "usage: $0 username"; $username=$ARGV[0];

NOTE: Since Perl arrays start with element 0, these lines are checkingfor and using the first and only parameter. Given this code change,you can retrieve only the most recent login like this:

boson> ./showlogins sbob | tail -1sbob Thu Oct 28 09:21:26 2004

If you take a look at the Perl scripts provided in the sample chapterfrom the otter book, you will probably be surprised at how easily therequired data can be extracted from the wtmpx file.

posted by Brahma at 2:42 PM 0 comments

Managing Library Paths with crle

Managing Library Paths with crle

To begin with, crle stands for "configure runtime linkingenvironment". It is a command that allows Solaris sysadmins to bettermanage their dynamic linker. Specifically, it allows you to configurelibrary paths so that programs run on a system will have as easyaccess to shared library files in locations like /usr/local/lib or/opt/lib as they do to /usr/lib. Instead of configuring aLD_LIBRARY_PATH to give these lib directories visibility, you run thecrle command and augment the load library path on a system-wide basis.

Page 220: Solaris Real Stuff

In other words, the crle command is basically the equivalent ofupdating ld.config.

What's Configured Now?

To view the system library path on a Solaris system, you can issue thecrle command on a line by itself:

# crle

Default configuration file (/var/ld/ld.config) not found DefaultLibrary Path (ELF): /usr/lib (system default) Trusted Directories(ELF): /usr/lib/secure (system default)

Notice that, in the absence of an ld.config file, a limited librarysearch path is established containing only /usr/lib for normal usage.

This library path setting is independent of any paths you may haveadded to your LD_LIBRARY_PATH interactively or through settings inyour dot files. The paths shown are the paths that binaries on yoursystem will use regardless of the LD_LIBRARY_PATH setting.

How to Configure New Paths

To add a new path to your dynamic linker, you would use the crle -lcommand, but this command overwrites the existing path. In otherwords, you need to repeat the existing path elements so as not toremove them. For example, you might type the following command to add/usr/local/lib to what is shown above:

<# crle -l /usr/lib:/usr/local/lib

Afterwards, you should verify the new settings:

# crle

Configuration file [2]: /var/ld/ld.configDefault Library Path (ELF): /usr/lib:/usr/local/lib TrustedDirectories (ELF): /usr/lib/secure (system default)

Command line:crle -c /var/ld/ld.config -l /usr/lib:/usr/local/lib

What should you do if You Break It?

Page 221: Solaris Real Stuff

If you type your crle command carefully, you should have an easy timeaugmenting your dynamic loader's search path. If, on the other hand,you break the path -- for example, by typing crle -l followed only bythe paths you intend to add -- you can send your Solaris system into avery troublesome state. The reason is simple; just about every commandthat you type on your Unix system depends on shared object filesstored in /usr/lib -- including commands as benign as ls. So, if/usr/lib disappears from your linker's search path, your ability towork wonders on the command line will come to an abrupt end -- atleast until you type the command again. The breakage will beillustrated with errors such as this:

$ lsld.so.1: ls: fatal: libc.so.1: open failed: No such file or directory Killed

Normal Unix command functionality can be restored by typing a secondcrle command. While you could also fix the problem for yourself bysetting your LD_LIBRARY_PATH as shown below, this wouldn't help anyoneelse who logs into the system.

LD_LIBRARY_PATH=/lib:/usr/lib;export LD_LIBRARY_PATH

Typing the crle command with the complete library path should fix theproblem immediately and easily -- except when you can't issue it.

What should you do if You Break It with sudo?

A second crle won't help you out of the pit that you inadvertentlycreated if your super powers are bequeathed through sudo. In thiscase, you won't be able to reproduce the linker's path with a "sudocrle" command because sudo itself depends on /usr/lib.

$ sudo crle -l /lib:/usr/lib:/usr/local/libld.so.1: sudo: fatal: libdl.so.1: open failed: No such file or directory Killed

In fact, remote access to the box will also fail because telnet, ftpand ssh will try to access shared libraries which are no longer in thepath.

$ telnet bosonTrying 10.11.11.51... Connected to boson.particles.org. Escapecharacter is '^]'. ld.so.1: in.telnetd: fatal: libdl.so.1: openfailed: No such file or directory Connection closed by foreign host.

And, while you can restore the functionality of many commands bysetting your LD_LIBRARY_PATH, this won't restore remote access -- and

Page 222: Solaris Real Stuff

it won't restore your sudo privileges because sudo, for fairly obvioussecurity reasons, doesn't pay attention to this environment variable.

While you can appeal to a higher authority (the guy who can exercisesuperuser privilege without using sudo), there's also a workaroundthat you can use to regain your superuser access without having towait for someone with the root password to come to your rescue.

The trick here is to use the LD_NOCONFIG environment variable to some(any) value, thus causing the runtime linker to ignore configurationfiles. This returns your operational library path to the default/usr/lib, thus allowing your sudo command to perform the repair of thehosed library path.

bash$ LD_NOCONFIG=true;export LD_NOCONFIGbash$ sudo crle -l /lib:/usr/lib:/usr/local/lib Password:

Where Did crle Come From?

The crle command had existed in Solaris since Solaris 7, but it mostfamiliar to systems administrators who need to augment library pathswhen installing applications that install libraries in non-systemlocations such as /usr/local/lib or /opt/lib. Since it establishedsystem-wide library paths, it is recommended over LD_LIBRARY_PATHsettings.

posted by Brahma at 2:41 PM 0 comments

hme Troubleshooting

hme Troubleshooting

To examine or set a specific hme interface (hme#), specify it as follows:

ndd -set /dev/hme instance #

To examine parameter values, use a ndd -get command. In particular,link_speed reveals whether the interface is operating on 10Mbs or100Mbs mode (settings of 0 and 1, repectively), and link_mode revealswhether it is running at half or full duplex (0 and 1, respectively).

ndd -get /dev/hme link_speedndd -get /dev/hme link_mode

To perform a hard set of the link_speed and link_mode parameters forhme#, several other parameters must be set with the ndd -set command.

Page 223: Solaris Real Stuff

In these parameter names, 10 or 100 refers to link_speed, fdx or hdxrefers to link_mode, and autoneg refers to autonegotiation capability.The setting corresponding to the desired mode should be set to "1" andall other parameters should be set to "0." (adv_autoneg_cap=1 is thedefault.) For example, to set hme# to 100/full duplex:

ndd -set /dev/hme instance #ndd -set /dev/hme adv_100T4_cap 0ndd -set /dev/hme adv_100fdx_cap 1ndd -set /dev/hme adv_100hdx_cap 0ndd -set /dev/hme adv_10fdx_cap 0ndd -set /dev/hme adv_10hdx_cap 0ndd -set /dev/hme adv_autoneg_cap 0

ndd -set commands can be used in the /etc/rc2.d/S69inet initializationscript to set the interface mode during boot rather than allowingautonegotiation. This can be useful if one of the interfaces isconnected to an older switch that is not autonegotiating the linespeed or mode correctly.

The line speed and mode can also be set for all hme interfaces on thesystem by setting the following in the /etc/system file and rebooting.(Note: The following are Sun's official recommendations. It may not benecessary to set every parameter to either 0 or 1, but it is easiestto get support when following instructions to the letter.):

* 100 Mb, full duplex:

set hme:hme_adv_autoneg_cap=0set hme:hme_adv_100fdx_cap=1

* 100 Mb, half duplex:

set hme:hme_adv_autoneg_cap=0set hme:hme_adv_100fdx_cap=0set hme:hme_adv_100hdx_cap=1

* 10 Mb, full duplex:

set hme:hme_adv_autoneg_cap=0set hme:hme_adv_100fdx_cap=0set hme:hme_adv_100hdx_cap=0set hme:hme_adv_10fdx_cap=1set hme:hme_adv_10hdx_cap=0

* 10 Mb, half duplex:

Page 224: Solaris Real Stuff

set hme:hme_adv_autoneg_cap=0set hme:hme_adv_100fdx_cap=0set hme:hme_adv_100hdx_cap=0set hme:hme_adv_10fdx_cap=0set hme:hme_adv_10hdx_cap=1

Hardware TestingOne elementary test for the ethernet hardware is to invoke thewatch-net-all command from the ok> PROM monitor prompt. This performssome simple diagnostics and listens on the ethernet port. Since thisis at a much lower level than the driver level, it can be used todistinguish between hardware and software problems.

Debug ModeThe hme, qfe and be ethernet drivers can be switched into debug mode.This can be done in the /etc/system file by adding:

set hme:hmedebug=1

Alternatively, this can be done on a live system using adb:

adb -kw /dev/ksyms /dev/memhmedebug/D (to display current value)hmedebug/W 1 (be careful to use uppercase "W")

Debug mode permits the driver to display messages to the console. Someof the messages are informational, others are error messages. Whenexamining the messages, remember that you are seeing one system's viewof the network, and that this bias needs to be taken into account.

posted by Brahma at 2:40 PM 0 comments

network card

Thanks each and every who send me suggestions...I got this suggestion from Mr Xu Ying which solved my problem..

You need to make sure that you have correct netmask.If the other servers have broadcast of 10.200.17.255.Your netmask should be 255.255.255.0 (ffffff00). You could change bothnetmaskand broadcast from command line:

Ifconfig eri1 netmask 255.255.255.0 broadcast + up

posted by Brahma at 2:40 PM 0 comments

Page 225: Solaris Real Stuff

PATCH EXIT CODES

> Don't know how current this is, but here's the list I saved> (my notes say INFODOC ID: 17973, but I'm not sure):

> PATCH EXIT CODES, taken from an installpatch script with the date of:

> # @(#) installpatch 6.9 98/10/09 SMI

> # Exit Codes:> # 0 No error> # 1 Usage error> # 2 Attempt to apply a patch that's already been applied> # 3 Effective UID is not root> # 4 Attempt to save original files failed> # 5 pkgadd failed> # 6 Patch is obsoleted> # 7 Invalid package directory> # 8 Attempting to patch a package that is not installed> # 9 Cannot access /usr/sbin/pkgadd (client problem)> # 10 Package validation errors> # 11 Error adding patch to root template> # 12 Patch script terminated due to signal> # 13 Symbolic link included in patch> # 14 NOT USED> # 15 The prepatch script had a return code other than 0.> # 16 The postpatch script had a return code other than 0.> # 17 Mismatch of the -d option between a> previous patch # install and the current one.> # 18 Not enough space in the file systems that are targets # of the> patch.> # 19 $SOFTINFO/INST_RELEASE file not found> # 20 A direct instance patch was required but not found> # 21 The required patches have not been installed on the manager> # 22 A progressive instance patch was required but not found> # 23 A restricted patch is already applied to the package> # 24 An incompatible patch is applied> # 25 A required patch is not applied> # 26 The user specified backout data can't be found> # 27 The relative directory supplied can't be found> # 28 A pkginfo file is corrupt or missing> # 29 Bad patch ID format> # 30 Dryrun failure(s)> # 31 Path given for -C option is invalid> # 32 Must be running Solaris 2.6 or greater

Page 226: Solaris Real Stuff

> # 33 Bad formatted patch file or patch file not found> # 34 The appropriate kernel jumbo patch needs to be installed

posted by Brahma at 2:39 PM 0 comments

Mounting Tape drive

RE: Mounting Tape drivePosted By chris.barbot On Wednesday, August 31, 2005 at 4:47 PM

NetBackup does allow you to multi-thread your backup sessions so thatyou can have multiple streams from multiple servers writing to that tapedevice at the same time.W/out the software, I'm not sure how you can go about doing that. I'dhave a hard time believing that NB is the only one that can do thisthough.

posted by Brahma at 2:38 PM 0 comments

Wednesday, August 17, 2005

Issue with join command

Re: Issue with join command

Try this:

file1 contents:file1 - line1file1 - line2file1 - line3file1 - line4

file2 contents:file2 - line1file2 - line2file2 - line3file2 - line4

Shell script append:

#! /bin/ksh### append line from file2 to end of line from file1#

Page 227: Solaris Real Stuff

awk '{FRONT = $0getline BACK < "file2"print FRONT BACK}' file1

Output:

./appendfile1 - line1file2 - line1file1 - line2file2 - line2file1 - line3file2 - line3file1 - line4file2 - line4

No space between FRONT and BACK.

Don

posted by Brahma at 1:03 PM 1 comments

Missing libraries when compiling with gcc

Missing libraries when compiling with gcc

I'm stumped on this one, and can't seem to (easily) find any solutions.

There are times when I try to compile open source packages which referencelibrarys such as -lnet -lgd etc. In the most recent case, -lsnmp.

I have crle set to properly access all of my library directories, and haveeven tried to force gcc to look at specific directories usingCFLAGS=-L... -I... I'm sure this is an easy-to-overcome problem, and thesolution is probably right in my face -- but I can't see it.

In my most recent case, I'm trying to recompile apcupsd with Net-SNMP. I'musing the Net-SMTP package from SFW, which installs in /usr/local./usr/local/lib is in my crle reference, but the autoconf (configure) scriptfails when trying to access -lsnmp or -lnetsnmp (I think that last one isright, I cleared the screen yesterday :)

Any ideas or pointers in the right direction would be greatly appreciated,and will earn a beer when in my neighborhood.

-- Alan W. Rateliff, II

Page 228: Solaris Real Stuff

> solution is probably right in my face -- but I can't see it.

CFLAGS tells the compiler where to find header files; crle tells therun-time linker where to find libraries when you have failed to link yourbinaries correctly. You have to set LDFLAGS to tell the compile timelinker where to find libraries and you use both -L and -R arguments.

For further enlightenment read the man pages ld(1) and ld.so.1(1).

> right, I cleared the screen yesterday :)

What does net-snmp-config print when invoked as:

% net-snmp-config --libs

?

> For further enlightenment read the man pages ld(1) and ld.so.1(1).

I have the same problem, but I'm a newbie and not very clear on what youmean. Could you please elaborate a bit more on your reply? Thanks a lot!

> mean. Could you please elaborate a bit more on your reply? Thanks a lot!

Elaborate on what matter? Setting up crle is never required for binarieswhich you build on the system properly. Moreover, if you set up crleincorrectly you are going to be learning how to boot from CDROM to fixyour system.

CFLAGS and LDFLAGS are set in your environment and properly writtenconfigure scripts will use them to create the Makefiles. On this system:

[~]$ echo $CFLAGS; echo $LDFLAGS-O2 -pipe -mcpu=ultrasparc-L/usr/sfw/lib -R/usr/sfw/lib -L/opt/sfw/lib -R/opt/sfw/lib-L/usr/openwin/lib -R/usr/openwin/lib

> For further enlightenment read the man pages ld(1) and ld.so.1(1).

I have played with all of the above for a few hours and turned up thefollowing results. I also want to note that in the past I hed beenadmonished to not use env variables during compile or run, thus my relianceupon crle.

Browsing through config.log I determined that I was using the wrong versionof OpenSSL for this specific instance. I installed a new package and tried

Page 229: Solaris Real Stuff

again. The configure script failed again, but this time I determined thatseveral kstat_* references in libnetsnmp.so were UNDEF'd. libnetsnmp.so waslinked with -lkstat. Now I'm just working to resolve the -lkstat issueeasily.

I tried a few other programs which were giving me headaches in the past andLDFLAGS seems to cure what ails them.

--

> I have played with all of the above for a few hours and turned up the> following results. I also want to note that in the past I hed been> admonished to not use env variables during compile or run, thus my reliance> upon crle.

Whoever told you that did not understand what he was doing.

> Browsing through config.log I determined that I was using the wrong version> of OpenSSL for this specific instance. I installed a new package and tried> again. The configure script failed again, but this time I determined that> several kstat_* references in libnetsnmp.so were UNDEF'd. libnetsnmp.so was> linked with -lkstat. Now I'm just working to resolve the -lkstat issue> easily.

Since libkstat.so is in /usr/lib I expect that your Makefile does notcontain that -lkstat directive. Take a close look at the last line ofyour make output starting with "gcc" to see if it contains "-lkstat".

> I tried a few other programs which were giving me headaches in the past and> LDFLAGS seems to cure what ails them.

Solaris binaries are ELF. The headers of the binaries are supposed tocontain the library search paths. They get those paths only if *you* linkthem correctly, generally using the -R argument in your LDFLAGS. Someconfigure scripts, notably the one in BIND, ignore your environment and youneed to edit the Makefiles in order to provide the runtime link paths.

> your make output starting with "gcc" to see if it contains "-lkstat".

In my case, libnetsnmp.so was linked with -lkstat, and a few others. So I

export LIBS="-lkstat -lm -lgen"

and voila. A shove in the right direction can work wonders. Thanks.

posted by Brahma at 1:03 PM 0 comments

Page 230: Solaris Real Stuff

What are three phases of System Installation?

> What are three phases of System Installation?

Diagnostics (optional)Disk PartitioningInstallingUpdating/PatchingRestoring the previous filesCustomizing

posted by Brahma at 1:00 PM 0 comments

multiple-thread application

==============================================================================TOPIC: can I look at the thread for a executable run?http://groups.google.com/group/comp.unix.solaris/browse_thread/thread/fede83991217c7==============================================================================

== 1 of 1 ==Date: Tues 9 Aug 2005 08:53From: Casper H.S. Dik

John <[email protected]> writes:

>Hi,

>I have a multiple-thread application on solaris 10, wonder how can I>tell from os level how many threads this app is running?

"ps -o nlwp" (and more so you can tell which process is running with thatmany lwps)

Casper--Expressed in this posting are my opinions. They are in no way relatedto opinions held by my employer, Sun Microsystems.Statements on Sun products included here are not gospel and maybe fiction rather than truth.

ps -o lwp -p <pid>

posted by Brahma at 12:55 PM 0 comments

Page 231: Solaris Real Stuff

Need help to tar files in a directory older(creation date) than 45 days old

Need help to tar files in a directory older(creation date) than 45 days old

Need help to tar files in a directory older(creation date) than 45 daysold. say there is 1000 files in the directory and only 300 of them areolder than 45 days old..i want to have it archive(tar) only the 300older files and leave the rest. Filename structure is:

filename.dat.Z (they have been compressed already

I just cant seem to get anything I try to work correctly please help!

I was trying combinations of find and tar wasn't working right.

Reply

Michael Tosch Aug 11, 4:45 pm show optionsN> I was trying combinations of find and tar wasn't working right.

Linux:find . -type f -mtime +45 -print0 | xargs -0 tar cf ../file.tar

Solaris, HPUX11.11:find . -type f -mtime +45 -exec tar cf ../file.tar {} +

If none of the above works, and file names do not have spaces:find . -type f -mtime +45 -print | xargs tar cf ../file.tar

-- Michael Tosch @ hp : com

Reply

Jim Aug 11, 4:56 pm show options

> find . -type f -mtime +45 -print | xargs tar cf ../file.tar

Wouldn't the above methods overwrite "../file.tar" each time the max #of arguments to 'tar' is reached (first and third cases, due torepeating the tar invocation with a new list of filename arguments), orfor each new file found (second case, one tar invocation per filefound)?

Page 232: Solaris Real Stuff

Reply

Michael Heiming Aug 11, 5:16 pm show options

> find . -type f -mtime +45 -print0 | xargs -0 tar cf ../file.tar

Without xargs:

find . -type f -mtime +45 -print | tar -cf file.tar -T -

Seems we are missing something like uuox award? ;-)

[..]

posted by Brahma at 12:43 PM 0 comments

Check_Cable

> is there a way within solaris 10 to determine if a network cabel is> plugged in the socket of an ethernet interface (connection LED is green) ?

Yes. For example:

# kstat -n hme0 | grep link_uplink_up 1

link_up = 0 means no link, link_up = 1 means has link.

It does not directly tell you if it is plugged in or not, but you candeduce that if link_up = 0, it is probably unplugged, plugged in wronginterface, or has a bad cable.

or

bash-2.05# ndd -get /dev/hme link_status1bash-2.05#

It'll give you an idea if there is a physically connected device plugged inanyway!

posted by Brahma at 12:42 PM 0 comments

patch cluster you have look at

Page 233: Solaris Real Stuff

RE: [Solaris-l] Re: Issues with Solaris 9Posted By hyattdj On Friday, August 12, 2005 at 12:46 PM

Have you installed the patch cluster available on I believe www.sun.com.To tell what patch cluster you have look atuname -athere is a number separated by a dash, the first number identifies the patchcluster, the second number (separated by the dash) is the patch clusterversion. There are usually 4 clusters released each year for each supportedSun OS. If the latest cluster is 18 and uname -a gives you 04, you are about4 years behind in patches.The media kit from Sun, will have the latest patch cluster available atbuild time integrated into it. So if you have maintenance, call them every3-6 months and have them push you your FREE media kit (free only if you havemaintenance contract, and for each contract you can get Sol8, Sol9 and Sol10each quarter).

posted by Brahma at 12:41 PM 0 comments

ufsdump/ufsrestore file system

ufsdump/ufsrestore file system

> The root disk has some bad sectors. I am wondering if using> ufsdump/ufsrestore to dump the file systems on root disk to another> disk, will the data on the bad sector be lose? (As when dumping the> file system, there are messages saying "cannot read block XXXXXX" ).

> Any better way to perform in this situation?

I recently faced the same situation. I wound up using the repair facilityin the format(1M) command. It took an excruciatingly painful amount of timeto actually determine WHICH blocks were faulty, but I wound up making itwork. All work was done on a 40MHz IPX Station under Solaris 7.

One thing I noticed, and it may be isolated to the Seagate drive I wasusing, is that the repairs did not survive a reboot. Had me WTF'ing allnight.

I was still able to boot off the bad drive, it just ran into major issuesafter booting. So I booted into single-user mode, looking something likethis:

Installed the new drive to device 0 (the boot device is 3.) Formatted,partitioned, and newfs's the drive. Ran "format" and repaired the badsectors on the boot device. Then, performed a command like this:

Page 234: Solaris Real Stuff

cd /newdriveroot; ufsdump 0f - / | ufsrestore rf -

Once done (with all filesystems, in my case) I moved the new drive to device3, and removed the old, bad drive.

Now, I did not have a working CD drive from which to boot, so I can onlyassume that booting from CD and doing the work might have saved me hassle.All-in-all, everything came over smoothly.

As an aside, I had to use a 68-pin drive to replace the old 50-pin drive.Anyone familiar with the IPX Stations will tell you there is absolutely NOroom for any kind of drive adapter on the back of the drive. So, Iimprovised. I used a 50F-68F adapter right in the on-board 50-pin SCSIport, then used a 68-pin cable to the drive. I can't imagine I'm the firstto do this, but I haven't seen any mention of anyone else. Works like abloody charm, and breathed some extra life into my little server :)

-- Alan W. Rateliff, II

Reply

Michael Tosch Aug 12, 7:22 pm show options

>>The root disk has some bad sectors. I am wondering if using>>ufsdump/ufsrestore to dump the file systems on root disk to another>>disk, will the data on the bad sector be lose? (As when dumping the>>file system, there are messages saying "cannot read block XXXXXX" ).

>>Any better way to perform in this situation?

> I recently faced the same situation. I wound up using the repair facility> in the format(1M) command. It took an excruciatingly painful amount of time> to actually determine WHICH blocks were faulty, but I wound up making it> work. All work was done on a 40MHz IPX Station under Solaris 7.

I suggestformat> analyzeand perform a read test.The default setting is to repair any bad sectors.

- Hide quoted text -- Show quoted text -

Page 235: Solaris Real Stuff

> One thing I noticed, and it may be isolated to the Seagate drive I was> using, is that the repairs did not survive a reboot. Had me WTF'ing all> night.

> I was still able to boot off the bad drive, it just ran into major issues> after booting. So I booted into single-user mode, looking something like> this:

> Installed the new drive to device 0 (the boot device is 3.) Formatted,> partitioned, and newfs's the drive. Ran "format" and repaired the bad> sectors on the boot device. Then, performed a command like this:

> cd /newdriveroot; ufsdump 0f - / | ufsrestore rf -

> Once done (with all filesystems, in my case) I moved the new drive to device> 3, and removed the old, bad drive.

You certainly have made the new disk boot-able:

man installboot

> Now, I did not have a working CD drive from which to boot, so I can only> assume that booting from CD and doing the work might have saved me hassle.> All-in-all, everything came over smoothly.

> As an aside, I had to use a 68-pin drive to replace the old 50-pin drive.> Anyone familiar with the IPX Stations will tell you there is absolutely NO> room for any kind of drive adapter on the back of the drive. So, I> improvised. I used a 50F-68F adapter right in the on-board 50-pin SCSI> port, then used a 68-pin cable to the drive. I can't imagine I'm the first> to do this, but I haven't seen any mention of anyone else. Works like a> bloody charm, and breathed some extra life into my little server :)

What an old iron! It's more than 10 years ago that I maintained IPC and IPXstations, but still have some of these tiny onboard-fuses in my drawer.

posted by Brahma at 12:40 PM 0 comments

Segmentation Fault

Ah, no, I wouldn't say that. "Segmentation Fault" is just never anacceptable way to report an error to a user, no matter how botched theconfiguration file might be.

The possible causes (not necessarily exclusive of each other) I canthink of are:

Page 236: Solaris Real Stuff

- bug in inetd

- bad patch installed

- corrupted system files (libraries or the inetd executableitself) -- can be caused by copying system files from onesystem to another rather than using the expected patchingand packaging tools

- someone removed a normal system file and replaced it withsomething else (a frightening number of people think this isa "reasonable" way to administer systems)

- the system has been compromised and damaged

"pstack" and "mdb" should help in debugging the core file. If not,then report the problem through the usual support channels.

> Assuming there's nothing wrong> with the hardware (what caused the crash?), you could try and move the> production inetd.conf file out of the way and see if inetd still> crashes with a minimal inetd.conf.

If you're completely desperate and have no support contract and nonormal means of debugging the problem, I suppose this might possiblyhelp.

Even if it did help, it would leave me with a queasy feeling in mystomach if it were my system: how on earth do you know you've reallysolved the problem if you haven't discovered the root cause? How doyou know it won't come back later?

posted by Brahma at 12:38 PM 0 comments

su - leaves me as myself, not root

su - leaves me as myself, not root

This quirk just started about a week ago. Never had a problem before.

As a normal user I cannot become the root user:

$ uname -aSunOS ultra 5.6 Generic_105181-15 sun4u sparc SUNW,Ultra-1$ iduid=1000(roger) gid=10(staff)

Page 237: Solaris Real Stuff

$ su -Password: [enter in root's password]Sun Microsystems Inc. SunOS 5.6 Generic August 1997$ iduid=1000(roger) gid=10(staff)$ echo hello >> /junk/junk: cannot create$

But this works:

$ iduid=1000(roger) gid=10(staff)$ suPassword: [enter in root's password]# iduid=0(root) gid=1(other)# echo hello > /junk# cat /junkhello# rm /junk# exit$ iduid=1000(roger) gid=10(staff)$

Suggestions?

Reply

John Howells Aug 16, 2:33 am show optionsN> Suggestions?

Presumably if you "echo $$" you are in the same shell for the "su -" case? Thedifference between "su -" and "su" is that the environment is set up for theformer, so presumably something in root's .profile is terminating the su andputting you back to the original shell. Try renaming root's .profile (and anyother startup stuff if you have changed root's shell from sh) and then see whathappens.

John Howells

posted by Brahma at 12:38 PM 0 comments

Page 238: Solaris Real Stuff

link utilisation,

John Smith wrote:> The linux version of sar (-A option for all) provides ethernet usage among> other things. I know the solaris version doesn't, but I mention it here> since I am looking for a similar output. Is there a tool available that will> give me ethernet link utilization on a particular interface every 3 seconds?

> Thanks,> JS

Not exactly link utilisation, but knowing the interface speed, you canextrapolate from:

kstat [-n <interface>] -T d -s '[or]bytes*' 3

Reply

posted by Brahma at 12:37 PM 0 comments

Configure NIC

The can add your entries to configure the NIC in the following file/etc/rcS.d/S30rootusr.sh

The syntax is ndd -set /dev/"interface" adv_autoneg_cap 0/1ndd -set /dev/"interface" adv_1000fdx_cap 0/1ndd -set /dev/"interface" adv_100hdx_cap 0/1ndd -set /dev/"interface" adv_100fdx_cap 0/1ndd -set /dev/"interface" adv_100hdx_cap 0/1ndd -set /dev/"interface" adv_10hdx_cap 0/1ndd -set /dev/"interface" adv_10fdx_cap 0/1

0 disable1 enableInterface : eri0, qfe0-3, ce, hme0 etc

Alternatively you can create a script in the /etc/init.d/ directory andlink it to the /etc/rc#.d, # can be 2 or 3 depending in which run stateyou prefer

HiFirstly Check the status, speed & mode:...# ndd -get /dev/qfe link_status1 = up0 = down

Page 239: Solaris Real Stuff

# ndd -get /dev/qfe link_speed1 = 100 Mb0 = 10 Mb# ndd -get /dev/qfe link_mode1 = Full Duplex (FDX)0 = Half Duplex (HDX)

Then:...These commands are usually placed in a startup scriptsuch as /etc/rc2.d/S99eri.

About to force 100Mbs Full Duplex (FDX) on eri: (Ex)

ndd -set /dev/eri instance 1ndd -set /dev/eri adv_100T4_cap 0ndd -set /dev/eri adv_100fdx_cap 1ndd -set /dev/eri adv_100hdx_cap 0ndd -set /dev/eri adv_10fdx_cap 0ndd -set /dev/eri adv_10hdx_cap 0ndd -set /dev/eri adv_autoneg_cap 0

These same thing do on diffrent ethernets...Hope youdone...

:)eNJOYMohammed Tanvir

posted by Brahma at 12:37 PM 0 comments

Slow lsof on Solaris

Slow lsof on Solaris

If lsof is taking several minutes to complete on Solaris, make sure todownload the latest package from Sunfreeware (or compile the latestversion from source). The latest version can make a significantdifference, such as this example on a Solaris 8 Sparc system.

lsof version 4.49:

# lsof -vlsof version information:revision: 4.49 -- find the latest revision at:ftp://vic.cc.purdue.edu/pub/tools/unix/lsofconfiguration info: 64 bit kernelconstructed: Sun May 14 20:55:07 EDT 2000

Page 240: Solaris Real Stuff

constructed by and on: steve@solariscompiler: /opt/SUNWspro/bin/cccompiler version: WorkShop Compilers 5.0 98/12/15 C 5.0compiler flags: -Dsolaris=80000 -DHASPR_GWINDOWS -xarch=v9-DHASIPv6 -DHAS_VSOCK -DLSOF_VSTR="5.8" -Oloader flags: -L./lib -llsof -lkvm -lelf -lsocket -lnslsystem info: SunOS solaris 5.8 Generic sun4u sparc SUNW,Ultra-5_10

# time /usr/local/bin/lsof -i :23real 4:22.2user 3:37.7sys 6.0

lsof version 4.68:

# /usr/local/bin/lsof -vlsof version information:revision: 4.68latest revision: ftp://lsof.itap.purdue.edu/pub/tools/unix/lsof/latest FAQ: ftp://lsof.itap.purdue.edu/pub/tools/unix/lsof/FAQlatest man page: ftp://lsof.itap.purdue.edu/pub/tools/unix/lsof/lsof_manconfiguration info: 64 bit kernelconstructed: Wed Jul 23 04:48:34 EDT 2003constructed by and on: steve@solariscompiler: gcccompiler version: 3.3compiler flags: -Dsolaris=80000 -DHASPR_GWINDOWS -m64 -DHASIPv6-DHAS_VSOCK -DLSOF_VSTR="5.8" -Oloader flags: -L./lib -llsof -lkvm -lelf -lsocket -lnslsystem info: SunOS solaris 5.8 Generic_108528-11 sun4u sparcSUNW,Ultra-5_10 Solaris

# time /usr/local/bin/lsof -i :23real 4.1user 0.8sys 2.9

posted by Brahma at 12:36 PM 0 comments

ctime, atime, and mtime

ctime, atime, and mtime

It is important to distinguish between a file or directory's changetime (ctime), access time (atime), and modify time (mtime).

Page 241: Solaris Real Stuff

ctime -- In UNIX, it is not possible to tell the actual creation timeof a file. The ctime--change time--is the time when changes were madeto the file or directory's inode (owner, permissions, etc.). It isneeded by the dump command to determine if the file needs to be backedup. You can view the ctime with the ls -lc command.

atime -- The atime--access time--is the time when the data of a filewas last accessed. Displaying the contents of a file or executing ashell script will update a file's atime, for example. You can view theatime with the ls -lu command.

mtime -- The mtime--modify time--is the time when the actual contentsof a file was last modified. This is the time displayed in a longdirectoring listing (ls -l).

In Linux, the stat command will show these three times.

posted by Brahma at 12:36 PM 0 comments

System errors defined

Miscellaneous UNIX notesSystem errors definedSystem errors are defined in /usr/include/sys/errno.h on Solarissystems and /usr/include/asm/errno.h on Red Hat Linux systems. Theinformation in this file is helpful in interpreting output of thetruss command (Solaris) or strace (Linux).

Example:#define EPERM 1 /* Operation not permitted */#define ENOENT 2 /* No such file or directory */#define ESRCH 3 /* No such process */#define EINTR 4 /* Interrupted system call */#define EIO 5 /* I/O error */#define ENXIO 6 /* No such device or address */#define E2BIG 7 /* Arg list too long */#define ENOEXEC 8 /* Exec format error */#define EBADF 9 /* Bad file number */#define ECHILD 10 /* No child processes */

Limiting find to one file systemThe find command's -xdev argument can be used to limit searches to onefile system.

Page 242: Solaris Real Stuff

Example: find all files on the root file system sortedsmallest-to-largest. Do not descend other file systems (etc. /usr,/var).

find / -xdev -ls | sort -n -k 7Viewing "raw" man pages in nroff/troff formatnroff -man manpage | more -s

Example:nroff -man qtool.8 | more -s

Printing man pagesTo output a UNIX man page in a format suitable for printing, pipe theman page through col -b.

Example:man command | col -bDisabling ssh1 compatibility with ssh.com serverssh protocol 1 is vulnerable to man-in-the-middle attacks with toolslike dsniff, and should not be used unless absolutely necessary.

To disable ssh protocol 1 with an ssh.com server,

1. Edit the /etc/ssh2/sshd2_config configuration file.

2.Change:Ssh1Compatibility yesTo:Ssh1Compatibility no

3. Send the sshd process a SIGHUP for the change to take effect.

Zombie processesA zombie process is a process that has exited, but whose exit code hasnot reached its parent process. The parent process has to perform await system call to read the exit code of a child. Until the parentreceives the exit code, the child process will remain in "zombie"state.

Zombie processes are already dead and cannot be "killed." They consumeno system resources except an entry in the system process table (seenin the proc-sz column with the sar -v command).

The only way to remove a zombie process is to kill its parent process.

Page 243: Solaris Real Stuff

More information:http://groups.google.com/groups?q=zombie+processes+wait&hl=en&lr=&ie=UTF-8&selm=1993Feb14.021655.13721%40acd4.acd.com&rnum=9/etc/hosts on WindowsThe file on Windows that provides the same functionality as /etc/hostsin UNIX is %SystemRoot%\system32\drivers\etc\hosts

stty: : Invalid argumentThis message is often caused when running stty in the C shellinitialization script .cshrc with a non-interactive shell (ex. an ssh,scp, rsh, or rsync command). stty should only be run in an interactiveshell.

Example change in .cshrc to check for an interactive shell:

Change:stty erase ^?

To:if ( $?prompt && { tty -s } ) stty erase ^?

xterm Xt error: Can't open display:If you receive this message when tunneling X11 traffic over an sshtunnel, (1) make sure that the remote ssh server allows X11 forwardingwith the X11Forwarding yes directive in the server configuration file(OpenSSH example), and (2) make sure that you are enabling X11forwarding on your ssh client with the -X flag.

[hutch@hutch hutch]$ ssh hutch@server[hutch@server hutch]$ echo $DISPLAY

[hutch@server hutch]$

[hutch@hutch hutch]$ ssh -X hutch@server[hutch@server hutch]$ echo $DISPLAYlocalhost:10.0[hutch@server hutch]$

X11 tunneling after su -In order to run X clients over an ssh tunnel after running su - for aroot login shell, you have to manually specify the DISPLAY andXAUTHORITY environment variables. These steps are not needed whenrunning su, su -m, or su -p.

Example:

Page 244: Solaris Real Stuff

/bin/su -DISPLAY=localhost:10.0 XAUTHORITY=~hutch/.Xauthority X_client-- or --export DISPLAY=localhost:10.0 XAUTHORITY=~hutch/.XauthorityX_client

sftp problemsWhen attempting to login to an OpenSSH sftp server, I received thefollowing error:

Request for subsystem 'sftp' failed on channel 0Couldn't read packet: Connection reset by peer

When receiving this error, make sure to check the permissions ofsftp-server. In this case, the permissions on the directory containingsftp-server were incorrect:

# grep sftp-server /usr/local/etc/sshd_configSubsystem sftp /usr/local/libexec/sftp-server

# ls -ld /usr/local/libexec /usr/local/libexec/sftp-serverdrwx------ 2 root other 512 Oct 7 2003 /usr/local/libexec-rwxr-xr-x 1 root other 28292 Oct 7 2003/usr/local/libexec/sftp-server

To correct the problem:chmod 755 /usr/local/libexec

X11 forwarding problemsWhen attempting to run an X client, I received the following errors:

debug1: X11 connection uses different authentication protocol.X11 connection rejected because of wrong authentication.

In this case, the file system housing the user's home directory wasfull, resulting in a 0-byte ~/.Xauthority file. Freeing up space inthe user's home directory fixed the problem.

sudo: must be setuid rootIf you receive this error when executing sudo, first check to makesure that sudo is setuid root. A less obvious cause of this error isthat sudo is located on a file system mounted nosuid. If this is thecase, you will have to remount the file system suid if sudo is needed.Note that mount -o remount,suid file_system may not work; you may haveto actually unmount the file system and remount it to fix the problem.

Page 245: Solaris Real Stuff

More information in this post.

posted by Brahma at 12:35 PM 0 comments

Restricting user access Email-only access

Restricting user accessEmail-only access

Create a user account with a home directory of /dev/null and a shellthat does not permit logins, such as /bin/false or /dev/null.FTP-only access

Set the user's shell to one that does not permit logins, such as/bin/false or /dev/null.Note: your FTP server may require that the user's shell is listed inthe /etc/shells file.Preventing FTP access

Add the user's account name into /etc/ftpusers.Restricted access

Set the user's shell to a restricted shell such as /bin/rksh or /bin/rsh.

This prevents:1. Use of the cd command2. Setting or changing the PATH variable3. Specifying a command or filename containing a slash (/) -- onlyfilenames in the current directory can be used4. Using output redirection (> or >>).Restricting by user group

Add the following to /etc/profile:

if [ -n "`groups | grep {group_name}'" ] ; thenecho "Users from group {group_name} cannot login to this machine."exit 1fi

This would restrict telnet and rsh access for users using Bourne shellor Korn shell. C shell users would still be able to access themachine.

Thanks to Augustus Carter for sending the following method ofrestricting C shell, Bourne shell, and Korn shell access on Solarissystems.

Page 246: Solaris Real Stuff

The following will restrict the C Shell as well as Bourne and Kornshells under Solaris 2.6, 7, 8, and 9 systems:

Create a text file called:/etc/su_users.txt

This will have any entries of usernames like this:lukehansleia

Add the following code to the /etc/profile file:

# 04-26-2002 - Restricts telnet and ssh access for batch user accounts# Bourne (sh) and Korn (ksh) shell users use the script in the /etc/profile file# C (csh) shell users use the script in the /etc/.login file# The /etc/su_users.txt file contains the list of batch accounts.TTY=`tty | awk -F/ '{printf ($3"/"$4)}'`USER_TTY=`w | awk '(\$2=="'$TTY'"){print \$1}'`for USERID in `cat /etc/su_users.txt`doif [ "$USER_TTY" = "$USERID" ]thenechoecho Interactive logins for the $USER_TTY user are disabled.echo Please login with your user id and do a su - $USER_TTY.echoexitfidone

Add the following code to the /etc/.login file:

# 04-26-2002 - Restricts telnet and ssh access for batch user accounts# Bourne (sh) and Korn (ksh) shell users use the script in the /etc/profile file# C (csh) shell users use the script in the /etc/.login file# The /etc/su_users.txt file contains the list of batch accounts.set TTY=`tty | awk -F/ '{printf ($3"/"$4)}'`set USER_TTY=`w|awk '{if ($2=="'$TTY'") print $1}'`foreach USERID (`cat /etc/su_users.txt`)if ( "$USER_TTY" == "$USERID" ) thenechoecho Interactive logins for the $USER_TTY user are disabled.echo Please login with your user id and do a su - $USER_TTY.echologout

Page 247: Solaris Real Stuff

endifend

posted by Brahma at 12:34 PM 0 comments

Which Solaris cluster is installed?

Which Solaris cluster is installed?To determine which "cluster" of the Solaris Operating Environment youhave installed:cat /var/sadm/system/admin/CLUSTER

Results:

SUNWCreq -- Core System SupportSUNWCuser -- End User System SupportSUNWCprog -- Developer System SupportSUNWCall -- Entire DistributionSUNWCXall -- Entire Distribution plus OEM support

To list the packages in each of the clusters above, view the/var/sadm/system/admin/.clustertoc file. This file also containsdescriptions of the clusters listed above.

posted by Brahma at 12:34 PM 0 comments

sd.conf

2Subject: Re: sd.conf

You can remove all entries in the sd.conf file that have target=n lun=mwith n!=5 and m>0 for your system. It takes 1/4 second to determinethat a target ID is not responding. My guess is that by adding theadditional lun entries for those targets which aren't present, thesoftware is spending 1/4 second per lun (even though it really shouldn'thave to).

The target ID is the same as the SCSI ID on the SCSI bus.The lun value is a Logical Unit Number for a given target id.

The data shown below identifies the device as cntndn. c1 is thecontroller number. tn is the target id. dn is the LUN. So the entriesyou show below identify that your system has a controller #1 (c1) withtarget ID 5 (t5) and two LUN's (d0 and d1).

Page 248: Solaris Real Stuff

If a target ID doesn't respond to a SCSI selection (selection timeout)then no LUN's can exist for that target ID. This is why you can removeall entries for those ID's which you don't have attached to your bus.If a target ID does respond but the LUN is not available, the time todetermine that a LUN is not available is minimal (probably inmicroseconds range). So it doesn't hurt to leave all LUN's enabled fortarget ID 5 even if you only are using two of them.

Russ

- Hide quoted text -

posted by Brahma at 12:33 PM 0 comments

onfiguring IPv4 interfaces:ifconfig: setifflags: SIOCSLIFFLAGS: interface: Cannot assign requested address interface

"configuring IPv4 interfaces:ifconfig: setifflags: SIOCSLIFFLAGS:interface: Cannot assign requested address interface"

An /etc/hostname.interface file exists, but the value in/etc/hostname.interface file is not in /etc/hosts. Add the IP addressand hostname listed in /etc/hostname.interface to /etc/hosts.

posted by Brahma at 12:33 PM 0 comments

Determining RPM of a Sun disk

Determining RPM of a Sun diskRun mkfs -m /dev/rdsk/cxtxdxs2 and multiply the rps= value(revolutions per second) by 60 to determine the RPM.

ex. # mkfs -m /dev/rdsk/c0t0d0s2mkfs -F ufs -o nsect=133,ntrack=27,bsize=8192,fragsize=1024,cgsize=16,free=10,rps=90,nbpi=2049,opt=t,apc=0,gap=0,nrpos=8,maxcontig=16/dev/rdsk/c0t0d0s2 412964

90 (rps) * 60 (seconds) = 5400 RPM.

posted by Brahma at 12:32 PM 0 comments

Solaris routing with multiple interfaces

Page 249: Solaris Real Stuff

Solaris routing with multiple interfacesIn this example, we have an hme2 network interface with an IP addressof 65.201.213.117, a network address of 65.201.213.112, and a subnetmask of 255.255.255.240. The interface was initially enabled using thedefault subnet mask and broadcast address.

ifconfig hme2hme2: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 3inet 65.201.213.117 netmask ff000000 broadcast 65.255.255.255

Routing Table: IPv4Destination Gateway Flags Ref Use Interface-------------------- -------------------- ----- ----- ------ ---------65.0.0.0 65.201.213.117 U 1 8 hme2

To correct the routing table:

1. Specify the correct subnet mask and broadcast address with ifconfig.ifconfig 65.201.213.117 netmask 255.255.255.240 broadcast 65.201.213.127

2. Add an /etc/netmasks entry for the correct netmask to be used aftersystem reboot.65.201.213.112 255.255.255.240

Corrected routing table:

Routing Table: IPv4Destination Gateway Flags Ref Use Interface-------------------- -------------------- ----- ----- ------ ---------65.201.213.112 65.201.213.117 U 1 1 hme2

posted by Brahma at 12:29 PM 0 comments

patchadd return codes

patchadd return codesThe "Return codes" listed when installing the Sun Recommended PatchClusters are listed toward the top of the patchadd script. Here aresome common return codes:

# 2 Attempt to apply a patch that's already been applied# 8 Attempting to patch a package that is not installed# 25 A required patch is not applied

posted by Brahma at 12:28 PM 0 comments

Page 250: Solaris Real Stuff

"umount: I/O error"

"umount: I/O error"umount: cannot unmount /mount_pointI received this error on a Solaris 8 system when the disk containing aVeritas volume failed. After killing all the processed identified withfuser -c /mount_point and confirming no other processes were in use inthe file system with lsof +D /mount_point, I had to issue the umount-f /mount_point command to forcibly unmount the file system:

# umount /file_systemumount: I/O errorumount: cannot unmount /file_system# fuser -c /file_system/file_system:# lsof +D /file_system# umount -f /file_system# mount | grep /file_system#

For Solaris users, only Solaris 8 and later provide the umount -f("force") flag. Other Unices may have the "force" flag for umount.

posted by Brahma at 12:28 PM 0 comments

Booting off an external CD-ROM

Booting off an external CD-ROMooting off an external CD-ROMTo boot off an external CD-ROM, determine the device path withprobe-scsi-all from the Open Boot PROM (OBP or "ok" prompt). Note: youmay want to temporarily perform a setenv auto-boot? false andreset-all if OBP warns you that executing probe-scsi-all could hangyour system.

Relevant probe-scsi-all output:

/sbus@b,0/SUNW,fas@3,8800000Target 3Unit 0 Removable Read Only device PLEXTOR CD-R PX-R820T 1.0811/21/0016:00

In this example, the SCSI target ID ("Target") is 3, and the physicalpath to the external CD-ROM is /sbus@b,0/SUNW,fas@3,8800000. To boot aSolaris Operating Environment CD using this external CD-ROM, you wouldissue the command:

Page 251: Solaris Real Stuff

{e} ok boot /sbus@b,0/SUNW,fas@3,8800000/sd@3,0:f

In the command above, sd@3,0 refers to the SCSI target (3) and LUN/Unit (0).

Note: when connecting an external CD-ROM device, make sure that theCD-ROM drive and system are both powered off. Connect the CD-ROM, andpower on the CD-ROM and then the system. Failure to follow thisprocedure may result in the following errors:

{e} ok boot /sbus@b,0/SUNW,fas@3,8800000/sd@3,0:fBoot device: /sbus@b,0/SUNW,fas@3,8800000/sd@3,0:f File and args:Bad magic number in disk labelCan't open disk label package

Can't open boot device

posted by Brahma at 12:28 PM 0 comments

Repairing a hard disk with bad sectors

Repairing a hard disk with bad sectorsThe format command may be used to repair a hard disk that isexperiencing read/write errors.

Example error:

WARNING: /sbus@3,0/SUNW,fas@3,8800000/sd@0,0 (sd120):Error for Command: read Error Level: RetryableRequested Block: 577712 Error Block: 577796Vendor: SEAGATE Serial Number: 9737K75183Sense Key: Media ErrorASC: 0x16 (data sync mark error), ASCQ: 0x0, FRU: 0xd2

While the format command allows you to repair an individual block, itis probably best to run analyze and perform at least a read test, asthere may be more than one bad block on the disk.

The analyze command will automatically repair errors it finds whenscanning the hard disk for bad blocks.

analyze> readReady to analyze (won't harm SunOS). This takes a long time,but is interruptable with CTRL-C. Continue? y

pass 0WARNING: /sbus@3,0/SUNW,fas@3,8800000/sd@0,0 (sd120):

Page 252: Solaris Real Stuff

Error for Command: read Error Level: RetryableRequested Block: 577800 Error Block: 577800Vendor: SEAGATE Serial Number: 9737K75183Sense Key: Media ErrorASC: 0x16 (data sync mark error), ASCQ: 0x0, FRU: 0xd2Medium error during read: block 577800 (0x8d108) (267/8/0)ASC: 0x16 ASCQ: 0x0Repairing hard error on 577800 (267/8/0)...ok.

posted by Brahma at 12:26 PM 0 comments

Mounting an ISO image

Mounting an ISO imageThe following instructions are for Solaris 8 and later. You must beroot to mount the ISO image.

lofiadm -a absolute_path_to_ISOmount -F hsfs -r /dev/lofi/1 mount_point

ex. mounting /home/hutch/cdrom.iso to /tmp/mntlofiadm -a /home/hutch/cdrom.iso/dev/lofi/1

mount -F hsfs -r /dev/lofi/1 /tmp/mnt

posted by Brahma at 12:26 PM 0 comments

gzip: "Value too large for defined data type"

gzip: "Value too large for defined data type"

You may receive this error when using gzip to compress files largerthan 4 gigabytes in size. The gzip executable provided in Solaris 8cannot compress files of this size, as illustrated by this example.

# ls -l maillog.0-rw-r--r-- 1 root other 25050791936 Dec 16 04:34 maillog.0

# /usr/bin/gzip maillog.0maillog.0: Value too large for defined data type

# grep "/usr/bin/gzip" /var/sadm/install/contents/usr/bin/gunzip=../../usr/bin/gzip l none SUNWgzip/usr/bin/gzcat=../../usr/bin/gzip l none SUNWgzip/usr/bin/gzip f none 0555 root bin 61032 28158 1019010579 SUNWgzip

Page 253: Solaris Real Stuff

# pkginfo -l SUNWgzipPKGINST: SUNWgzipNAME: The GNU Zip (gzip) compression utilityCATEGORY: systemARCH: sparcVERSION: 11.8.0,REV=2000.01.08.18.12BASEDIR: /VENDOR: Sun Microsystems, Inc.DESC: The GNU Zip (gzip) compression utilityPSTAMP: on28-patch20020416193049INSTDATE: Mar 11 2003 16:37HOTLINE: Please contact your local service providerSTATUS: completely installedFILES: 25 installed pathnames5 shared pathnames2 linked files5 directories8 executables208 blocks used (approx)

To compress files larger than 4 gigabytes in size, build and installgzip with support for these large files, or use the gzip package 1.3.xor newer from

posted by Brahma at 12:26 PM 0 comments

File descriptors

File descriptorsA file descriptor is a handle created by a process when a file isopened. There is a limit to the amount of file descriptors perprocess. The default Solaris file descriptor setting is 64.

If the file descriptor limit is exceeded for a process, you may seethe following errors:

"Too Many Open Files""Err#24 EMFILE" (in truss output)

To display a process' current file descriptor limit, run/usr/proc/bin/pfiles pid | grep rlimit on Solaris systems.

Display system file descriptor settings:ulimit -Hn (hard limit, cannot be exceeded)ulimit -Sn / ulimit -n (soft limit may be increased to hard limit value)

Page 254: Solaris Real Stuff

Increasing file descriptor settings for child processes (example):$ ulimit -Hn1024$ ulimit -Sn64$ ulimit -Sn 1024$ ulimit -Sn1024

Solaris kernel parameters:rlim_fd_cur: soft limit

It may be dangerous to set this value higher than 256 due tolimitations with the stdio library. If programs require more filedescriptors, they should use setrlimit directly.

rlim_fd_max: hard limit

It may be dangerous to set this value higher than 1024 due tolimitations with select. If programs require more file descriptors,they should use setrlimit directly.

posted by Brahma at 12:25 PM 0 comments

"/usr/ccs/bin/ld: illegal option -- E"

"/usr/ccs/bin/ld: illegal option -- E"If you receive this error when compiling software on a Solaris system,use the GNU version of ld provided by the Sunfreeware binutils packageinstead of /usr/ccs/bin/ld.

If you cannot specify the path to GNU ld (typically /usr/local/bin/ld)instead of /usr/ccs/bin/ld by means of an environment variable orconfiguration setting when compiling software, you may rename/usr/ccs/bin/ld as /usr/ccs/bin/ld.sun to ensure that GNU ld is used.

# mv /usr/ccs/bin/ld /usr/ccs/bin/ld.sunBuild and install the software.# mv /usr/ccs/bin/ld.sun /usr/ccs/bin/ld

posted by Brahma at 12:25 PM 0 comments

No space left on device" when installing packages

No space left on device" when installing packagesIf you receive this error when running pkgadd on datastream packages,

Page 255: Solaris Real Stuff

you are likely filling up the /var file system, as pkgadd extracts thecontents of the package into /var/tmp.

Example errors when filling up the /var file system when runningpkgadd on a gcc 3.3 package:

Processing package instance <SMCgcc> from </tmp/gcc-3.3-sol7-sparc-local>

gcc(sparc) 3.3cpio: Cannot write "reloc/lib/libstdc++.a", errno 28, No space left on devicecpio: Cannot write "reloc/lib/libstdc++.so.5.0.4", errno 28, No spaceleft on devicecpio: Cannot write "reloc/lib/libsupc++.a", errno 28, No space left on device

The solution is either to symbolically link /var/tmp into a largerfile system that can accommodate extracting the package, or totranslate the package from datastream format (one monolithic file)into file system format (extracting the package into its componentparts). I recommend the latter solution.

For example, if you have a large file system named /files1, you cantranslate the package into file system format with:pkgtrans package /files1ex. pkgtrans gcc-3.3-sol7-sparc-local /files1

To install the package:pkgadd -d /files1

After the package is installed, you can remove the package in filesystem format with:rm -r /files1/packageex. rm -r /files1/SMCgcc

posted by Brahma at 12:25 PM 0 comments

Backup strategy

Joe,

I would need a great deal more information to give you a clear answer.I work in an environement with over 500 windows servers. Currently weare using NetBackup by Veritas (Symantic now). Previously, whenwe were at 250 servers, we used BackupExec.

Page 256: Solaris Real Stuff

I have reviewed many options of work group and enterprise solutuionsover the past six years, so I can give you a pretty good idea of whatto look to.

Here is the information I need:

How many serversdo you need to back up?How much data is backedup on a Full backup?Do you know the backup stratigy you would like to use?That is, do you want to perform an incremental on a dailybasis, or a differential? I would assume you would want to perform afull once a week.Are you running MS SQL and / or Oracle?What are you using for email systems?Do you have SAN disk storage in place?Are you concerned about backing up UNIX as well as Windows?Is there a possibility of backing up clients?What hardware do you have to work with? How old is it?What budget do you have to work with?

Once you get this information to me, I will be able to help you makethat decision.

Thanks,Mark Bellows

posted by Brahma at 12:24 PM 0 comments

Get your expert 10 Step Migration Survival Guide and poster

Weekly: Get your expert 10 Step Migration Survival Guide and poster

If you have been through an Exchange migration, you already know howchallenging they are. You know to prepare and seek help. If youhaven't been through one, take it from those who have. Get help now.Ask the experts what to do and you'll survive your migration - andcome out a hero. This 10 Step Migration Survival Guide and poster willserve as a basic roadmap during your migration and help you avertdisaster and deliver a migration without impact on your organization.

The 10 steps include:1. Motivate2. Investigate3. Gather4. Restructure5. Architect

Page 257: Solaris Real Stuff

6. Test7. Implement8. Organize9. Notify10. Sign Off

Get your copy today and do more than just survive your migration.

posted by Brahma at 12:24 PM 0 comments

Test for directory name in upper case

RE: Test for directory name in upper case

Hi,

plz try .... i think it should work

ls -al | grep ^d | awk '$9 /^\.?[A-Z]+$/'

regards,raj

posted by Brahma at 12:23 PM 0 comments

"No space left on device" when installing packages

"No space left on device" when installing packagesIf you receive this error when running pkgadd on datastream packages,you are likely filling up the /var file system, as pkgadd extracts thecontents of the package into /var/tmp.

Example errors when filling up the /var file system when runningpkgadd on a gcc 3.3 package:

Processing package instance <SMCgcc> from </tmp/gcc-3.3-sol7-sparc-local>

gcc(sparc) 3.3cpio: Cannot write "reloc/lib/libstdc++.a", errno 28, No space left on devicecpio: Cannot write "reloc/lib/libstdc++.so.5.0.4", errno 28, No spaceleft on devicecpio: Cannot write "reloc/lib/libsupc++.a", errno 28, No space left on device

The solution is either to symbolically link /var/tmp into a largerfile system that can accommodate extracting the package, or to

Page 258: Solaris Real Stuff

translate the package from datastream format (one monolithic file)into file system format (extracting the package into its componentparts). I recommend the latter solution.

For example, if you have a large file system named /files1, you cantranslate the package into file system format with:pkgtrans package /files1ex. pkgtrans gcc-3.3-sol7-sparc-local /files1

To install the package:pkgadd -d /files1

After the package is installed, you can remove the package in filesystem format with:rm -r /files1/packageex. rm -r /files1/SMCgcc

posted by Brahma at 12:23 PM 0 comments

Solaris packages Creation

packages

The following are the steps I used to create a Solaris 2.6 package forthe Mutt e-mail client. Although these steps may be applied for othersoftware, please visit the excellent Sunfreeware.com Creating Packagespage for additional information on creating Solaris packages.

1. Uncompress the mutt source.gunzip -cd mutt-1.4i.tar.gz | tar xvf -

2. Create an alternate destination directory for the package you will create.mkdir /tmp/pkg

3. Change mutt's configure file to use /tmp/pkg as the destination directory.cd mutt-1.4vi configure

Change:ac_default_prefix=/usr/local

To:ac_default_prefix=/tmp/pkg

4. Build mutt in the alternate destination directory:./configure && make && make install

Page 259: Solaris Real Stuff

5. Create a package prototype file containing a list of all files thatwere just installed.cd /tmp/pkgfind . -print | pkgproto > /tmp/pkg/prototype

6. Edit the package prototype and change your username and group tobin. In this example, my username is hutchib and my group is nsm.vi /tmp/pkg/prototype

Change username hutchib and group nsm to bin.:%s/hutchib/bin/g:%s/nsm/bin/g

7. Add the following line to the top of your prototype file.i pkginfo=./pkginfo

8. Create a pkginfo file for your package.vi /tmp/pkg/pkginfo

Add:PKG="BHmutt"NAME="mutt"ARCH="sparc"VERSION="1.4i"CATEGORY="application"VENDOR="mutt.org"EMAIL="[email protected]"PSTAMP="Brandon Hutchinson"BASEDIR="/usr/local"CLASSES="none"

Here are descriptions from Sunfreeware's page on what the fields mean:

PKG = the name you have chosen for the package directoryNAME = the program nameARCH = the operating system versionVERSION = the version number for your programCATEGORY = the program is an applicationVENDOR = whoever wrote the softwareEMAIL = an email contactPSTAMP = the person who did the port perhapsBASEDIR = the /usr/local directory where the files installCLASSES = just put none here9. Create the package in datastream format.cd /tmp/pkgpkgmk -r `pwd`

Page 260: Solaris Real Stuff

10. Translate the package into a file system format.cd /var/spool/pkgpkgtrans -s `pwd` /tmp/mutt-1.4i

11. Compress the package.gzip /tmp/mutt-1.4i

You should now be able to install the package on the appropriateplatform using the pkgadd command.# pkgadd -d ./mutt-1.4i

posted by Brahma at 12:22 PM 0 comments

Disk duplication

Disk duplication

It is easy to duplicate a disk with the same geometry (cylinders,heads, sectors) with the dd command. In the following example, I willduplicate boot disk c0t0d0 with c0t1d0 on a Solaris system. Of course,this is not the same as mirroring the boot disk.

The dd command's bit-for-bit copy includes the partition table andboot block, so duplicating the partition table with prtvtoc and makingthe disk bootable with installboot is not necessary.

# format < /dev/nullSearching for disks...done

AVAILABLE DISK SELECTIONS:0. c0t0d0 <SUN9.0G cyl 4924 alt 2 hd 27 sec 133>/sbus@1f,0/SUNW,fas@e,8800000/sd@0,01. c0t1d0 <SUN9.0G cyl 4924 alt 2 hd 27 sec 133>/sbus@1f,0/SUNW,fas@e,8800000/sd@1,0Specify disk (enter its number):

1. Make a bit-for-bit copy of the source disk to the destination diskwith dd. Use a 1 megabyte blocksize instead of the 512 byte default tospeed up the operation.dd if=/dev/rdsk/c0t0d0s2 of=/dev/rdsk/c0t1d0s2 bs=1024k

2. Change the /etc/vfstab file on the duplicate boot disk to reflectthe correct SCSI ID.

Page 261: Solaris Real Stuff

mkdir /tmp/mntmount /dev/dsk/c0t1d0s0 /tmp/mntvi /tmp/mnt/etc/vfstab

Change references of c0t0d0 to c0t1d0:

:%s/c0t0d0/c0t1d0/g:wq!

umount /tmp/mnt

3. Test booting off the duplicate boot disk (assuming disk1 is thecorrect Open Boot Prompt device alias for c0t1d0s0).

reboot -- disk1

posted by Brahma at 12:22 PM 0 comments

Gathering Solaris system information

Gathering Solaris system informationA UNIX administrator may be asked to gather system information abouthis/her Solaris systems. Here are the commands used on a Solaris 7system to gather various system information.

ProcessorsThe psrinfo utility displays processor information. When run inverbose mode, it lists the speed of each processor and when theprocessor was last placed on-line (generally the time the system wasstarted unless it was manually taken off-line).

/usr/sbin/psrinfo -vStatus of processor 1 as of: 12/12/02 09:25:50Processor has been on-line since 11/17/02 21:10:09.The sparcv9 processor operates at 400 MHz,and has a sparcv9 floating point processor.Status of processor 3 as of: 12/12/02 09:25:50Processor has been on-line since 11/17/02 21:10:11.The sparcv9 processor operates at 400 MHz,and has a sparcv9 floating point processor.

The psradm utility can enable or disable a specific processor.

To disable a processor:/usr/sbin/psradm -f processor_id

Page 262: Solaris Real Stuff

To enable a processor:/usr/sbin/psradm -n processor_id

The psrinfo utility will display the processor_id when run in eitherstandard or verbose mode.

RAMThe prtconf utility will display the system configuration, includingthe amount of physical memory.

To display the amount of RAM:

/usr/sbin/prtconf | grep MemoryMemory size: 3072 Megabytes

Disk spaceAlthough there are several ways you could gather this information, thefollowing command lists the amount of kilobytes in use versus totalkilobytes available in local file systems stored on physical disks.The command does not include disk space usage from the /proc virtualfile system, the floppy disk, or swap space.

df -lk | egrep -v "Filesystem|/proc|/dev/fd|swap" | awk '{total_kbytes += $2 } { used_kbytes += $3 } END { printf "%d of %dkilobytes in use.\n", used_kbytes, total_kbytes }'19221758 of 135949755 kilobytes in use.

You may want to convert the output to megabytes or gigabytes anddisplay the statistics as a percentage of utilization.

The above command will list file system usage. If you are interestedin listing physical disks (some of which may not be allocated to afile system), use the format command as the root user, or the iostat-En command as a non-privileged user.

Processor and kernel bitsIf you are running Solaris 2.6 or earlier, you are running a 32-bit kernel.

Determine bits of processor:isainfo -bv

Determine bits of Solaris kernel:isainfo -kv

posted by Brahma at 12:20 PM 0 comments

Page 263: Solaris Real Stuff

CPU statistics

ans m41 wrote:> it gives only a current snapshot

without arguments, it actually gives an average of statssince last reboot.

> it isn't possible to view previously collected data> with mpstat, isn't it ?

mpstat doesn't collect samples, but you can easily script it, eg:

mpstat 30 120 > /var/tmp/mpstat.out

will collect for one hour with a 30 sec interval.

You can also directly peek the kernel counters like this:

kstat -T d -p cpu:*:sys:cpu_ticks_idle cpu:*:sys:cpu_ticks_kernel cpu:*:sys:cpu_ticks_user 30

posted by Brahma at 12:20 PM 0 comments

Friday, August 12, 2005

Need help to tar files in a directory older(creation date) than 45 days old

eed help to tar files in a directory older(creation date) than 45 days oldAll 4 messages in topic - view as tree

Need help to tar files in a directory older(creation date) than 45 daysold. say there is 1000 files in the directory and only 300 of them areolder than 45 days old..i want to have it archive(tar) only the 300older files and leave the rest. Filename structure is:

filename.dat.Z (they have been compressed already

I just cant seem to get anything I try to work correctly please help!

I was trying combinations of find and tar wasn't working right.

d

> I was trying combinations of find and tar wasn't working right.

Page 264: Solaris Real Stuff

Linux:find . -type f -mtime +45 -print0 | xargs -0 tar cf ../file.tar

Solaris, HPUX11.11:find . -type f -mtime +45 -exec tar cf ../file.tar {} +

If none of the above works, and file names do not have spaces:find . -type f -mtime +45 -print | xargs tar cf ../file.tar

> find . -type f -mtime +45 -print0 | xargs -0 tar cf ../file.tar> find . -type f -mtime +45 -exec tar cf ../file.tar {} +> find . -type f -mtime +45 -print | xargs tar cf ../file.tar

Wouldn't the above methods overwrite "../file.tar" each time the max #of arguments to 'tar' is reached (first and third cases, due torepeating the tar invocation with a new list of filename arguments), orfor each new file found (second case, one tar invocation per filefound)?

Reply

> Linux:> find . -type f -mtime +45 -print0 | xargs -0 tar cf ../file.tar

Without xargs:

find . -type f -mtime +45 -print | tar -cf file.tar -T -

posted by Brahma at 3:46 PM 0 comments

Using the script command within a program

Using the script commandAI want to use the script command within a shell script to capture thecommands and their output. I know I can redirect stdout/stderr to afile but I would prefer to use the script command.

If I use it in a bash shell, the command starts up fine but I cannotfigure out how to stop it. I tried using exit, but it simply exits outof the shell script not the command script.

Is this possible ? If so how ?

Thanks,Kevin.

Page 265: Solaris Real Stuff

> If I use it in a bash shell, the command starts up fine but I cannot> figure out how to stop it. I tried using exit, but it simply exits out> of the shell script not the command script.

What you probably want is:

script -c /path/to/your/script /path/to/output/file

--

> What you probably want is:

> script -c /path/to/your/script /path/to/output/file

Assuming that your script has a '-c' option. Otherwise try

SHELL=/path/to/your/script script /path/to/output/file

and reset SHELL inside your script (You can save the value of SHELL ifneeded in another variable).

Reply

posted by Brahma at 3:45 PM 0 comments

Two IP's Addresses

You have to create one hostname file for each ip:

hostname.eri0hostname.eri0:1hostname.eri0:2hostname.eri0:3hostname.eri0:4hostname.eri0:5hostname.eri0:6

In every file you have to enter one line with the hostname of the ip. Thehostname must have an corresponding entry in /etc/hosts where you have tospecify the ip.

Cu, florian

----------------------------------------------------------------------(__) Florian Meister(oo)

Page 266: Solaris Real Stuff

/-------\/ / | || * ||w---||^^ ^^ "The only problem with troubleshooting is thatsometimes trouble shoots back."----------------------------------------------------------------------

Betreff: Two IP's Addresses

Hello everyone.Somebody know how can I configure two IP's addresses (192.168.4.28 and192.168.4.251) on the same network card Ethernet 10/100Mbps?The server is a Sun using Solaris 8 OS.Thank you.

EDSON

posted by Brahma at 3:44 PM 0 comments

syslog.conf wildcards

==================TOPIC: syslog.conf wildcards

hi

i recently tried to change the syslog.conf file to have

*.* @remoteserver

where "remoteserver" is a remote host with syslog daemon running.I restarted syslog by using kill -HUP "syslog pid".When i use the logger command to insert a custom log message, themessage did not get transfered to "remoteserver".But when i changed the syslog.conf to something like

auth.notice @remoteserver

and using the logger command after restarting syslog, it works. Themessage did get transfered to "remoteserver".

why doesn't the wildcard work? Is there something else i need to do ?

thanks

Page 267: Solaris Real Stuff

[email protected] wrote:> i recently tried to change the syslog.conf file to have> *.* @remoteserver

Solaris syslog doesn't allow *.*, try the following

*.debug @remoteserver

which should have the same effect (and remember to use TABs :-))

--Geoff LaneThe attitude ``The computer said so, so it must be right''is always amusing to the people who program them.

posted by Brahma at 3:42 PM 0 comments

can I look at the thread for a executable run?

==============================================================================TOPIC: can I look at the thread for a executable run?

>I have a multiple-thread application on solaris 10, wonder how can I>tell from os level how many threads this app is running?

"ps -o nlwp" (and more so you can tell which process is running with thatmany lwps)

ps -o lwp -p <pid>

posted by Brahma at 3:41 PM 0 comments

TOPIC: SOLARIS LDAP CLIENT INTEGRATION WITH OPENLDAP

TOPIC: SOLARIS LDAP CLIENT INTEGRATION WITH OPENLDAP===============================

I am starting to setup my Solaris 8 boxes to talk to a Linux Openldapserver (2.2.6). I cannot remember the patchid but I'm patched theSolaris boxes with the latest LDAP client patches - I'm aware therewere some issues with the original ldapclient in Sol8.

The howto's I'm following have me manually editing the ldap_client_file(against Sun's recommendation) but it seems to be the way of gettingthe Native Sun LDAP client to speak with Openldap.

Page 268: Solaris Real Stuff

The test Solaris boxes happily query the LDAP server for their userinformation and users can log in - the Solaris LDAP client uses aspecially created (proxy) ldap account to log into LDAP server.

The problem I have is the root account cannot arbitrarily resetpasswords (as it should normally do), if root tries to change a userspassword it prompts for the users current password before asking forthe new one (ala a regular user). The password change does work.

On Linux using the PADL nss_ldap libraries there is an option in theLDAP configuration files to define a "root" login for LDAP i.e. anyprocess running as UID 0 uses the given credentials (defined as rootDN)which will be suitably permissioned.

I don't have the option of using PADL nss_ldap libraries on solaris (Iknow I could compile them) I have to use what is there already. Howdoes Solaris normally get around this problem or is all managementfunctions (e.g. resetting passwords) done through some fancy GUI basedtool (we're not running X on the boxes either)

So after rambling away my question is....is there an equivalentconfiguration option in the Solaris Native LDAP client to the PADLnss_ldap configuration option RootDN?

I have a suspicion the answer is no and all management level stuff isdone purely through the GUI tool rather than existing ldapifiedcommands e.g. passwd

Thanks in advance for any insights.

> I have a suspicion the answer is no and all management level stuff is> done purely through the GUI tool rather than existing ldapified> commands e.g. passwd

We don't use the GUI tools either, and also don't use passwd and such.We use LDAP Administrator (www.ldapadministrator.com) to do alladministration in LDAP directly. No need to pay for LDAP Administrator,there are also other tools around (LDAP Browser/Editor by Jarek Gaworis the one I'm using at home).

HTH, Erik.

posted by Brahma at 3:40 PM 0 comments

tcsh startup tasks

Page 269: Solaris Real Stuff

tcsh startup tasksAll 4 messages in topic - view as tree

What exactly does tcsh do after completing processing of .cshrc? I havea delay of 3 to 5 seconds every time I login to my Solaris account,which uses /bin/tcsh as my login shell.

I added the line

echo "LEAVING .cshrc"

as the last line of .cshrc. This print statement is printed veryquickly after I login. Thus, the problem does not lie withing .cshrc.

Also, only *after* the 3-5 sec delay is over do I enter .login (becauseI added an echo command as the first line of that file). Hence, I'minterested in whatever occurs after .cshrc but before .login.

I do not have a .tchsrc file, and /etc/csh.cshrc and /etc/csh.login donot exist.

Reply

The Solaris 8 man page sais

If the shell is a login shell, this is the sequence of invo-cations: First, commands in /etc/.login are executed. Next,commands from the .cshrc file your home directory are exe-cuted. Then the shell executes commands from the .login filein your home directory; the same permission checks as thosefor .cshrc are applied to this file.andIf you start csh as a login shell and you do not have a.login in your home directory, then the csh reads in the/etc/.login.

The 1st statement is true, and IMHO the 2nd is wrong.Anyway there should be nothing between .cshrc and .login.

You can test the tcsh login as follows:

mkdir tmpcd tmpln -s /bin/tcsh ./-tcshenv PATH=${PATH}:. -tcsh

Page 270: Solaris Real Stuff

And replace the latter by

truss ...

-- Michael Tosch @ hp : com

Reply

Try setting the ECHO and the VERBOSE variables inside your .tcshrcsetenv echosetenv verbose

Use lowercase! and make sure to do setenv, not just set. You want thesevariables to carry on into further processes.These will show you every command executed.

Reply

I found the culprit - an 8MB .history file! Removed that file and notonly do logins speed up, but also logouts were affected as well.

-BobAndover, MA

posted by Brahma at 3:33 PM 0 comments

Change Network parameters

e: Weblogic instance going down again and again

dp -You didn't provide much information on your environment for the weblogicservers. I don't think anyone out there could help you debug something wedon't have information for. Here's some things you can do on the OS sideto ensure your issue isn't related to bad system tunning. I'm not goingto list the tunables for each OS version, so I will assume it is runningSolaris 8.1. You need to check your file descriptor limits !!! Solaris 8 is 8192- check your open files, using lsof on the application directory with aword count. Check it against your current file descriptor limits - uname-a, uname -n. If the limit is not high enough, then bump it up. On secondthought, bump it up anyway, it could only hurt you keeping it too low.* In /etc/system add line set rlim_fd_max=8192 and setrlim_fd_cur=81922. Here's some common tunables also in the system file

Page 271: Solaris Real Stuff

set tcp:tcp_conn_hash_size=32768set autoup 900set tune_t_fsflushr 13. You may also want to increase ( as long as enough shared memeory isavailable ) the java memory -Xms128m -Xmx200m4. finally set the tcp tunables. I'm including the one's I use, butshould only be used as guidelines for you. These will not be permanent,so you need to include in a rc startup script./usr/sbin/ndd -set /dev/tcp tcp_time_wait_interval 60000/usr/sbin/ndd -set /dev/tcp tcp_conn_req_max_q 16384/usr/sbin/ndd -set /dev/tcp tcp_conn_req_max_q0 16384/usr/sbin/ndd -set /dev/tcp tcp_ip_abort_interval 60000/usr/sbin/ndd -set /dev/tcp tcp_keepalive_interval 320000/usr/sbin/ndd -set /dev/tcp tcp_rexmit_interval_initial 4000/usr/sbin/ndd -set /dev/tcp tcp_rexmit_interval_max 10000/usr/sbin/ndd -set /dev/tcp tcp_rexmit_interval_min 3000/usr/sbin/ndd -set /dev/tcp tcp_smallest_anon_port 32768/usr/sbin/ndd -set /dev/tcp tcp_xmit_hiwat 131072/usr/sbin/ndd -set /dev/tcp tcp_recv_hiwat 131072/usr/sbin/ndd -set /dev/tcp tcp_naglim_def 1/usr/sbin/ndd -set /dev/ce instance 0/usr/sbin/ndd -set /dev/ce rx_intr_time 32

Regards,Brian

posted by Brahma at 3:32 PM 0 comments

can not crontab -e

Re: can not crontab -e !

Hi ,

Check whether your editor is set to VI or not .or directly you can do this at command promt

set EDITOR=vi (enter)

posted by Brahma at 3:30 PM 0 comments

ptree CDE hange

CDE hang

Hi,

Page 272: Solaris Real Stuff

When I use the X manager to connect to one of the systems, it hang on the"Starting the Common Desktop Environment CDE Version 1.4" after supplied thelogin ID and password.

How to solve this problem?

Thanks in advance.

Reply

Michael Tosch You can debug it.

ptree `pgrep dtlogin | head -1`

and run "truss -p" on the youngest process ID.

How to go about it?

# truss -p 6005poll(0xFFBEFAA8, 2, -1) (sleeping...)signotifywait() (sleeping...)lwp_cond_wait(0xFF1234E8, 0xFF1234F8, 0xFF11CD80) (sleeping...)lwp_cond_wait(0xFF11CD28, 0xFF11CD10, 0x00000000) (sleeping...)

lwp_cond_wait(0xFF1234E8, 0xFF1234F8, 0xFF11CD80) Err#62 ETIMEpoll(0xFFBEFAA8, 2, -1) (sleeping...)signotifywait() (sleeping...)lwp_cond_wait(0xFF1234E8, 0xFF1234F8, 0xFF11CD80) (sleeping...)lwp_cond_wait(0xFF11CD28, 0xFF11CD10, 0x00000000) (sleeping...)

> Sorry, how to go about running the ptree command?

> YC

man ptree

/usr/bin/ptreeolder Solaris versions have/usr/proc/bin/ptree

Here you go:

# ptree -a 99331 /etc/init -

Page 273: Solaris Real Stuff

6005 /usr/dt/bin/dtlogin -daemon9933 /usr/dt/bin/dtlogin -daemon9957 /bin/ksh /usr/dt/bin/Xsession9983 /usr/dt/bin/solregis -cd9997 /usr/dt/bin/sdt_shell -c unset _ PWD; ./export/home/ope10000 /usr/bin/bash

Hmm,

you have likely an offending line in your .profile,looks like you have "/usr/bin/bash",which starts another interactive session.

Remove this line!

posted by Brahma at 3:29 PM 0 comments

Monday, August 08, 2005

search inode symbolic links

After getting the inode number by

ls -Lli file

you can find hard links by running "find ... -inum ..." on thesame file system.A symbolic link can be in any file system, and there are differentrepresentations for it (e.g. " -> afile" or " -> ./afile").

> 2) In case we have symlinks to that file - where they are ?

One can try to find them by running "find ... -follow -inum ..."on each file system.However, there might be symbolic links on another host that are notvalid on *this* host.

posted by Brahma at 8:35 AM 0 comments

How do I find out what my machine's memory is being used for?

Subject: 6.4) How do I find out what my machine's memory is being used for? How can I tell if I need more memory?

Page 274: Solaris Real Stuff

To discover how much virtual memory (i.e. swap) is free, run "swap -s" or"vmstat". If you're using tmpfs for /tmp, "df /tmp" will also work.

Discovering how physical memory is being used can be more difficult,however. Memory pages that are not being used by processes are used as asort of extended cache, storing pages of memory-mapped files for possiblelater use. The kernel keeps only a small set of pages free for short-termuse, and frees up more on demand. Hence the free memory reported by vmstatis not an accurate reflection, for example, of the amount of memoryavailable for user processes.

An easy way to determine whether or not your machine needs more memoryis to run vmstat and examine the po (page out) column and the sr (scanrate) column. If these columns consistently show large numbers, thissuggests that your machine does not have enough memory to support itscurrent workload, and frequently needs to write pages belonging toactive processes to disk in order to free up enough memory to run thecurrent job.

posted by Brahma at 8:32 AM 0 comments

What can I do if my machine slows to a crawl or just hangs?

Subject: 6.2) What can I do if my machine slows to a crawl or just hangs?

Try running "ps" to look for large numbers of the duplicate programs orprocesses with a huge size field. Some system daemons occasionally canget into a state where they fork repeatedly and eventually swamp thesystem. Killing off the child processes doesn't do any good, so you haveto find the "master" process. It will usually have the lowest pid.

Another useful approach is to run vmstat to pin down what resource(s)your machine is running out of. You can tell vmstat to give ongoingreports by specifying a report interval as its first argument.

The programs "top" and "sps" are good for finding processes that areloading your system. "Top" will give you the processes that are consumingthe most cpu time. "Sps" is a better version of "ps" that runs much fasterand displays processes in an intuitive manner. Top is available atftp://ftp.groupsys.com/pub/top/. Sps is available atftp://ftp.csv.warwick.ac.uk/pub/solaris2/sps-sol2.tar.gz.

Doug Hughes <Doug dot Hughes at Eng dot Auburn dot EDU> has written asmall, quick PS workalike called "qps", available from his web page athttp://www.eng.auburn.edu/users/doug/second.html

Page 275: Solaris Real Stuff

Sometimes you run out of memory and you won't be able to run enoughcommands to even find out what is wrong. You will get messages of the type"out of memory" or "no more processes". Note that "out of memory" refers tovirtual memory, not physical memory. On a Solaris system, virtual memoryis generally equal to the sum of the swap space and the amount of physicalmemory (less a roughly constant amount for the kernel) on the machine. Thecommand "swap -s" will tell you how much virtual memory is available.

You can sync the disks to minimize filesystem corruption if you have tocrash the system:

Use the L1-A sequence to crash the system. If you are on an older system,type "g0" and you will get the message "panic: ... syncing file systems".When you see the word "done", hit L1-A again and reboot. On systemswith the "new" prom, type "n" to get into the new command mode and type"sync".

posted by Brahma at 8:31 AM 0 comments

what is the minor faults meaning in mpstat

Subject: SUMMARY: what is the minor faults meaning in mpstat

Thank you all for the responses. Russell's answer was the most clear oneand I think everybody can have benefith of reading this. Not only thosewho don't know, but also the for the once already know it can berefreshing to read this one:

A process running on a typical 32 bit system is allowed to address 4Gbof memory. There is a really huge amount available to it on a 64bit system.This memory isn't really there, so it is called "virtual memory". Therange of available addresses in virtual memory is called the process's"addressspace."

Most of the time the process will be holding only a few pointers intodifferent parts of it's address space and will be attempting to read andwrite memory through those pointers. All of these memory accesses areintercepted by the operating system (OS) which maps the "virtual"address the process in using to a real address in RAM.

Obviously there isn't enough RAM on most computers to even do aone-to-one mapping from the processes virtual addresses to a physicaladdress. Fortunately, because the process can only access it's addressspace through pointers, it can only access a few addresses at a time.Moreover, generally each new access is to an address that is near themost recent access.

Page 276: Solaris Real Stuff

Statistically, most of a process's pages will only get accessed once.However, in the short run there will be a small block of pages that arebeing accessed frequently, because most programs contain loops, andrepeatedly read certain pages. Only rarely, (hopefully!) a process willaccess a page that it last accessed a relatively long time ago.

Here is how the OS manages all of this:

1. Chunks of real memory are allocated to the process in pages. OnSolaris, the page size is 4096 bytes. Each page is timestampedwhenever a process references it. If the process writes on the page itis marked as"dirty".

2. Periodically a kernel process called pageout (PID=2) sweeps throughphysical memory and moves pages that haven't been accessed by a processfor a long time to a free list. These times are in the order of tens ofseconds. During a memory shortage, the sweep rate increases.

3. If there is a shortage of physical memory, the oldest, dirty pages inthe free list get written onto the swap device. When a processaccesses a virtual address, one of three things can happen:

1. The corresponding physical page is in memory. The access succeeds.

2. The page is not in memory - but it is on the disk. This is called a"major fault". The process is blocked and a context switch occurs. Whilethe process is waiting, the page is brought off the disk into memory, theprocesses Hardware Address Table (HAT) is updated and the process ismarked "runnable".

3. The page is no longer mapped into the process, but is on the freelist. This is called a "minor fault." (I got there at last!) In this case acontext switch still occurs and the page is removed from the free listand mapped back into the process.

There is a more light-hearted explanation here:http://www.netjeff.com/humor/item.cgi?file=TheThingKing

posted by Brahma at 8:31 AM 0 comments

top 5 CPU consuming processes.

prstat -s cpu -n 5means sort the processes by CPU utlizationfor the top 5 CPU consuming processes.

Page 277: Solaris Real Stuff

posted by Brahma at 8:30 AM 0 comments

CDE hang

OPIC: CDE hang

== 1 of 1 ==Date: Thurs 4 Aug 2005 13:18From: Michael Tosch

NewBie wrote:> Hi,>> When I use the X manager to connect to one of the systems, it hang on the> "Starting the Common Desktop Environment CDE Version 1.4" after supplied the> login ID and password.>> How to solve this problem?>> Thanks in advance.>>

You can debug it.

ptree `pgrep dtlogin | head -1`

and run "truss -p" on the youngest process ID.

posted by Brahma at 8:30 AM 0 comments

Thursday, August 04, 2005

UNIX system performance and resource utilization measuring tool

TOPIC: UNIX system performance and resource utilization measuring tool

>>> Is there any UNIX system performance and resource utilization measuring>>> tool>>> with graphical display available for free?>>>>>> For Solaris, here's one:>>>> http://www.sunfreeware.com/setoolkit.html

Page 278: Solaris Real Stuff

>>>> Try "se zoom.se">> And if you want something fancier than that, you can get Orca, which is> built on top of the se toolkit.

I downloaded but not able to install orca though both se tool and rrdtoolhad installed successfully.

Got the following when make command issued:

pod2man --release=1.0.40 --center=rrdtool rrdtool.pod > rrdtool.1sh: pod2man: not found*** Error code 1make: Fatal error: Command failed for target `rrdtool.1'Current working directory/export/home/operator/orca-0.27/packages/rrdtool-1.0.40/doc*** Error code 1make: Fatal error: Command failed for target `all-recursive'Current working directory/export/home/operator/orca-0.27/packages/rrdtool-1.0.40*** Error code 1make: Fatal error: Command failed for target `make_rrdtool'Current working directory /export/home/operator/orca-0.27/packages*** Error code 1make: Fatal error: Command failed for target `all'

posted by Brahma at 10:04 AM 0 comments

sendmail

* Vortex <[email protected]> wrote:[..]> The line concerning alias in my sendmail.mc is :>> define(`ALIAS_FILE',`/etc/mail/aliases')dnl

I'll just assume you also created a new sendmail.cf file fromthe sendmail.mc after changing it and restarted the daemon.

As root, run "/path/to/sendmail -bv root" and post the output.

--Progress (n.): The process through which Usenet has evolved from

Page 279: Solaris Real Stuff

smart people in front of dumb terminals to dumb people in frontof smart terminals.-- [email protected]

posted by Brahma at 10:03 AM 0 comments

Printinng

If I `cut filename | telnet a.b.c.d 9100` the printer prints thefile, however if I `lpr -P hplj4100ps_1 /etc/printers.conf` or I do`lp -d hplj4100ps_1/.profile` only the first job will be printed(repeatedly), the second job never prints, both jobs are stacked inthe queue, so I have to remove them forcefully by `lpc abort all`.

Does anybody successfully overcome this issue?

posted by Brahma at 10:03 AM 0 comments

door File

Its a 'door' file. Door files are used for inter process communication.Thanks to all who replied.GR

>Hi guys, what type of file is the following>Dr--r--r-- 1 root root 0 Jan 10 2005 picld_door>>>What does the 'D'in the permissions mean ?>

posted by Brahma at 10:02 AM 0 comments

Panics

Log files Related messages in /var/adm/messages are checked.Sometimes hardware and software failures are prefaced by messageswritten to the log file. If there are disk error messages in the logfile, and UNIX(r) file system (UFS) routines listed on a stack trace, itis likely that there is a disk hardware problem.

Panics Often, system panics can originate from software, however,it is possible to incur a panic from a hardware fault. Some panic

Page 280: Solaris Real Stuff

messages that indicate hardware problems include :Asynchronous memory error Indicates a memory problem

Asynchronous memory fault Usually indicates a bus problem betweenmemory and CPU

posted by Brahma at 10:01 AM 0 comments

Printinng Issues

The inability to cancel a job could be attributed to the contents ofyour /etc/inetd.conf file, specifically the in.lpd line. It has to beuncommented AND it MUST refer to "tcp6" and not tcp. You should alsorun the command pmadm -l and make certain there are no references totcp services. zsmon services are OK; they are for serial ports. TCPservices in this context can cause problems and should be removed.

Krishna, you report two problems. The second problem (can'tcancel print jobs) compounds the first (print jobs repeatingendlessly).

In most cases, the second problem is easier to fix than thefirst, so it's a good place to start. Martha suggested thecorrect entry in /etc/inetd.conf. The old-style TCP listenerdaemons can also interfere with the operation of the cancelcommand, and they are no longer used for printing. Run thesecommands to disable the listeners to prevent them from causingtrouble:

pmadm -r -p tcp -s lpdpmadm -r -p tcp -s lppmadm -r -p tcp -s 0 (that's the number zero)

See if that helps the problem with the cancel command.

The problem with the repeated print jobs is something I'veseen happen with the HP Jetdirect/HPPI software. If youlook in the file /opt/hpnpl/README.UNX, you'll find somevery interesting reading in Section III:

************************************III. ISSUES/SOLUTIONS/WORKAROUNDS************************************...

Page 281: Solaris Real Stuff

3. Issue: Job get printed more than once in LaserJet 8500,1100 and 2500CM printer.

Solution: Turn off Job recovery for the printer queue. Inthis case, jobs will not get recovered if there ispower failure....6. Issue: Large job prints again and again and gives"Connection reset by peer" error in LJ4500 whentime out is set to 90sec in printer front panel.

Workaround: Increase the idle-timeout value by increments of90sec on the HP Jetdirect. If the maximum value of3600sec does not resolve issue, then try 0sec.

7. Issue: Jobs are printed multiple times by hpnpf if theprinter is offline for long time.

Workaround: Increase the idle-timeout value by increments of90sec on the HP Jetdirect. If the maximum valueof 3600sec does not resolve issue, then try 0sec.

Perhaps one of those workarounds suggested by HP will help...

Since I received some replies to my previous summary,I include a second summary including them, since they maybe of interest to some of you.

Basically:

Tim Henrion said:

I missed your original question when it was sent to the list. We hadthis same problem and fixed it the same way as you did. The problemwas caused by the listener not running. Check out the man pages forlisten and nlsadmin to learn more.

And Mathew Stier was kind enough to include the setup procedureto manually define a printer on Solaris 2.5 (which I include at the endof this mail).

Definitely, it seems that some of these steps is performed by admintoolbut not by jetadmin install printer.

Page 282: Solaris Real Stuff

Hope this summary is usefull for someone else there.Mariel

ORIGINAL POST

I hope this is an easy and stupid thing that I am doing.I am trying to install an HP 4V network printer.I installed jetadmin.I defined the printer using jetadmin (all the files under /etc/printersare created ok).When I type: lpq, after waiting for a long time, I get the followingmessage:could not talk to print service at 'machine_name'

When I try to print something using lpr -P printer filename, I get onthe console the follwoing message:Error transfering print job 1check queue for (printer_name@machine_name)I shut down and restared the lpsched hundred of times already, and evenrebooted the machine (just in case) to no avail.Any ideas?

FIRST SUMMARY

I am still puzzled about this problem. I don't think I found a solutionbut just a workaround.If you have a machine, that was never defined as printer server before,you install jetadmin, and then define a printer via jetadmin, you getthis error.This happens no matter which version of jetadmin you use (I tried with 3

of them including the last one 3.15).The workaround is to define another printer (even a non existing one)using admintool.This makes the lpq and the rest of the commands start working, and youare able to use the network printer without any problems, even ifafterwards youdelete that extra printer.I think that the problem might be something that is done when admintooldefines a printer that the "add printer" of jetadmin doesn't do, but Ido not know what itis.

PRINTER SETUP PROCEDURE

INFODOC ID: 2097

Page 283: Solaris Real Stuff

SYNOPSIS: Solaris 2 printer configuration commands - quick referenceDETAIL DESCRIPTION:

Setting up local printers under Solaris 2.x.

Some of these steps are only for information: they're commented out.Most of this has to be done as root.

Format is English description, then 2.x command on the right.

Check general printer status lpstat -s

If not running, start /etc/init.d/lp start

# To stop all print services /etc/init.d/lp stop

Show all printers configured lpstat -p all

Show a particular printer lpstat -p myprinter

Show a particular printer in detail lpstat -p myprinter -l

Check port monitor services pmadm -l

Turn off the portmonitor on ttyb pmadm -d -p zsmon -s ttybpmadm FLGS changes to 'ux'

# To turn ON the portmonitor on ttyb pmadm -e -p zsmon -s ttyb# pmadm FLGS changes to 'u'

Add a new printer on ttyb cd /dev/termchgrp lp b; chown lp b; chmod660 blpadmin -p myprinter -v/dev/term/b

Send the output to a file touch /tmp/printoutinstead chmod 666 /tmp/printout #temporarylpadmin -p myprinter -v/tmp/printout

# Remove a printer lpadmin -x myprinter

To enable enable myprinter

Page 284: Solaris Real Stuff

To accept jobs accept myprinter

To check status (again !) lpstat -p myprinter

To print something lp -d myprinter /etc/passwd

Allow normal users to print without lpadmin -p myprinter -o nobanner

banners (otherwise, only root canprint without banners)

Fix up the serial line settings lpadmin -p myprinter (still nobanner) -o"nobanner stty='cs8 19200'"

Add a description lpadmin -p myprinter -D 'My newprinter'

Make it the system default destination lpadmin -d myprinter

At this point it should be possible to print a job and have itappear either in /tmp/printout (if that's the output device) oron the printer.

Adding filters

The scheduler attempts to match the type of job beingprinted tothe type of the printer. If it does not find one, it prints anerror message:

# lp -d myprinter -T ps /etc/passwdUX:lp: ERROR: There is no filter to convert the file content.TO FIX: Use the lpstat -p -l command to find aprinter that can handle the file typedirectly, or consult with your system administrator.

# To force a filter to be applied to a job, define: The type oftheinput job The type of data which the printer can accept ToprintHPGL toan OSE6450 printer, the HPGL must be followed by thestring'\033%-12345x' (The first character is ESCAPE.) This can beaccomplished as follows:

Page 285: Solaris Real Stuff

Define the printer as taking ose lpadmin -p myprinter -I osedata. The printer will onlyprint data type ose: the schedulermust find a filter to print othertypes of data to this printer.

Make a directory for local filters mkdir /usr/lib/lp/local

Add a suitable filter cd /usr/lib/lp/localcat >hpgl2ose#!/bin/sh#echo anything to send beforethe job

cat -echo '\033%-12345'exit 0^D

Make it executable, etc. cd /usr/lib/lp/local

Make it executable, etc. chmod 755 hpgl2osechown lp hpgl2ose; chgrp lphpgl2ose

Add a suitable filter definition cd /etc/lp/fdcat >hpgl2ose.fd# comments hereInput types: hpglOutput types: osePrinter types: anyPrinters: anyFilter type: fastCommand:/usr/lib/lp/local/hpgl2oseOptions: PRINTER * = -p*^D

chown lp hpgl2ose.fdchgrp lp hpgl2ose.fdchmod 664 hpgl2ose.fd

Add it to the list of filters the lpfilter -f hpgl2ose systemknowsabout-F /etc/lp/fd/hpgl2ose.fd

Page 286: Solaris Real Stuff

Check the filters that the system lpfilter -f all -lrecognizes

Check just one filter lpfilter -f hpgl2ose -l

# Remove a filter lpfilter -f hpgl2ose -x

Check the output from the spooler lpadmin -p myprinter -v/tmp/printout

(printer is default dest)cp /dev/null /tmp/printout lp -T hpgl-o nobanner /etc/passwdcat -v /tmp/printout

Check that this file will print sleep 30000 </dev/term/b &ok if required (stty settings will stty cs8 19200 </dev/term/bvary) cp /tmp/printout /dev/term/b

check the output trayps -efl | grep 30000kill sleep-pid

Point the printer back at the lpadmin -p myprinter -v/dev/term/bserial line

Configuring a print server

Format is English description, then 2.x command below (since thecommandsare too long to go on the same line).

Install patch 100863 and append this line to /etc/lp/Systems:

+:x:-:s5:-:n:10:-:-:Allow all connections

Check it works :

# lpsystem -lSystem: +Type: s5Connection timeout: neverRetry failed connections: after 10 minutesComment: Allow all connections#

Page 287: Solaris Real Stuff

Identify the "listen" port monitor version number (typ. 4)

# nlsadmin -V4#

Show all port monitors: this shows typical output:

# sacadm -lPMTAG PMTYPE FLGS RCNT STATUS COMMANDzsmon ttymon - 0 ENABLED/usr/lib/saf/ttymon ##

Show tcp "listen" port monitors

# sacadm -l -p tcpInvalid request, tcp does not exist#

Add a listen port monitor if not present

# sacadm -a -p tcp -t listen -c "/usr/lib/saf/listen tcp" -v 4

Show tcp "listen" port monitors

# sacadm -l -p tcpPMTAG PMTYPE FLGS RCNT STATUS COMMANDtcp listen - 0 STARTING/usr/lib/saf/listen tcp#

#

Get the print server's universal address (in hex)

# lpsystem -A00020203819c88fb0000000000000000#

Add aBSD listener, if required. The '4' is from 'nlsadmin -V' above.

# pmadm -a -p tcp -s lpd -i root -v 4 -m `nlsadmin -o/var/spool/lp/fifos/listenBSD -A"\x00020203819c88fb0000000000000000"`#

Page 288: Solaris Real Stuff

or (in sh, if you cannot cut and paste)

# ad=`lpsystem -A`# pmadm -a -p tcp -s lpd -i root -v 4 -m `nlsadmin -o/var/spool/lp/fifos/listenBSD -A "\x$ad"`#

Check that this has taken effect

# pmadm -l -p tcp -s lpdPMTAG PMTYPE SVCTAG FLGS ID <PMSPECIFIC>tcp listen lpd - root\x00020203819c88fb0000000000000000 - p - /var/spool/lp/fifos/listenBSD ##

To remove the listener at any time, use

# pmadm -r -p tcp -s lpd

To add System V clients, add the STREAM service used for print requests.

The '4' is from 'nlsadmin -V' above.

# pmadm -a -p tcp -s lp -i root -v 4 -m `nlsadmin -o/var/spool/lp/fifos/listenS5`#

Check that this has taken effect

# pmadm -l -p tcp -s lpPMTAG PMTYPE SVCTAG FLGS ID<PMSPECIFIC>tcp listen lp - root - - p -/var/sp

ool/lp/fifos/listenS5 ##

Figure out the address to use for service 0, the nlps server.

# lpsystem -A00020203819c88fb0000000000000000#

which divides into:

Page 289: Solaris Real Stuff

0000000000000000 16 zeros for padding819c88fb Server's IP address in hex0203 BSD printer port == 5150ACE SysV printer port == 27660002 INET protocol family

The BSD listener uses the 515 port as reported by 'lpsystem -A' - theSystem V spooler uses 2766, so theaddress must be changed to

00020ACE819c88fb0000000000000000

The "\x" is added to force it to be interpreted in hex.

Add service 0, the nlps server

# pmadm -a -p tcp -s 0 -i root -v 4 -m "`nlsadmin -c/usr/lib/saf/nlps_server -A '\x00020ACE819c88fb0000000000000000'`"#

Check that this has taken effect

# pmadm -l -p tcp -s 0PMTAG PMTYPE SVCTAG FLGS ID<PMSPECIFIC>tcp listen 0 - root\x00020ACE819c88fb0000000000000000 - c - /usr/lib/saf/nlps_server ##

To remove the System V listener at any time, use

# pmadm -r -p tcp -s lp# pmadm -r -p tcp -s 0

Configuring a print client of a BSD print server

Register the print server name with the print service

# lpsystem -t bsd bsdserver"bsdserver" has been added#

To remove the print server, use

Page 290: Solaris Real Stuff

# lpsystem -r bsdserverRemoved "bsdserver".#

To check what's been added

# grep bsdserver /etc/lp/Systemsbsdserver:x:-:bsd:-:n:10:-:-:#

Add the printer to the client system. This information is entered intothe directory /etc/lp/printers by the lpadmin command.

# lpadmin -p local_prt_name -s bsdserver!rmt_prt_name

To remove the printer, use

# lpadmin -x local_prt_name

Set the printer's content and type

# lpadmin -p local_prt_name -T unknown -I any

Allow queuing

# accept local_prt_name

Enable the printer

# enable local_prt_name

Check the configuration

# lpstat -p local_prt_name -lprinter local_prt_name is idle. enabled since Sun May 9 17:50:15 BST1993. available.Content types: anyPrinter types: unknownDescription:Users allowed:(all)Forms allowed:(none)Banner not requiredCharacter sets:(none)

Page 291: Solaris Real Stuff

Default pitch:Default page size:

Configuring a print client of a System V print server

Register the print server name with the print service

# lpsystem -t s5 sys5server"sys5server" has been added#

To remove the print server, use

# lpsystem -r sys5serverRemoved "sys5server".#

To check what's been added

# grep sys5server /etc/lp/Systemssys5server:x:-:s5:-:n:10:-:-:#

Add the printer to the client system. This information is entered intothe directory /etc/lp/printers by the lpadmin command.

# lpadmin -p local_prt_name -s sys5server!rmt_prt_name

To remove the printer, use

# lpadmin -x local_prt_name

Set the printer's content and type

# lpadmin -p local_prt_name -T PS -I PS

Register bundled PostScript filters using the lpfilter command. Fromsh:

# cd /etc/lp/fd# for filt in `ls | sed 's/\.fd//'`> do> lpfilter -f $filt -F $filt.fd> done#

Page 292: Solaris Real Stuff

Check the filters have been added

# lpfilter -f all -l

Allow queuing

# accept local_prt_name

Enable the printer

# enable local_prt_name

Check the configuration

# lpstat -p local_prt_name -lprinter local_prt_name is idle. enabled since Sun May 9 17:50:15 BST

1993. available.Content types: anyPrinter types: unknownDescription:Users allowed:(all)Forms allowed:(none)Bannernot requiredCharacter sets:(none)Default pitch:Default page size:

Attention print queue gurus --

I have a few HP printers queued on my 2.6 Solaris Ultra-60 box, configuredusing jetadmin from HP. They've been working fine for many months (some for years), until yesterday.

A user tried to print a file yesterday, and it is now repeatedly comingout on their printer. They've had to shut the printer off.

I tried to cancel the print job with `cancel -u cbartley', and after abouttwenty seconds I get:could not talk to print service at biocompwhich repeats another three times.

I get the same error message (although just once) with `lpq -P papowers'.

Page 293: Solaris Real Stuff

I tried stopping and starting the scheduler with `lpshut' and `/usr/lib/lp/lpsched'. No difference.

In /var/spool/lp/logs/lpsched, I see:11/28 14:24:35: printer fault. type: write root, status: 14msg: (exec exit fault)

I searched DejaNews and came up pretty much empty-handed. More info:

---------------------------------A `ps -ef | grep lp' yields:---------------------------------cbartley 11950 11949 0 10:58:54 ? 0:00 /bin/sh -c/etc/lp/interfaces/papowers papowers-67 biocomp!cbartley "67-1" 1 "root 11949 11943 0 10:58:54 ? 0:00 /usr/lib/lpschedcbartley 11973 11972 0 10:58:54 ? 0:00 /bin/sh -c/etc/lp/interfaces/papowers papowers-67 biocomp!cbartley "67-1" 1 "cbartley 11996 11975 0 10:58:55 ? 0:00 cat/var/spool/lp/tmp/biocomp/67-1root 11943 1 0 10:58:54 ? 0:00 /usr/lib/lpschedcbartley 11972 11951 0 10:58:54 ? 0:00/usr/spool/lp/bin/lp.tell papowerscbartley 11951 11950 0 10:58:54 ? 0:00 /bin/sh -c/etc/lp/interfaces/papowers papowers-67 biocomp!cbartley "67-1" 1 "cbartley 11975 11974 0 10:58:54 ? 0:00 /usr/bin/sh/etc/lp/interfaces/model.orig/papowers papowers-67 biocomp!cbartley

---------------------------------A `lpstat -t' yields:---------------------------------scheduler is runningsystem default destination: hp3147system for _default: biocomp (as printer hp3147)device for hp2116: /dev/hp2116device for hp3210: /dev/hp3210device for hp3147: /dev/hp3147device for papowers: /dev/papowershp3147 accepting requests since Fri Dec 18 14:01:06 CST 1998hp2116 accepting requests since Mon Apr 19 15:04:57 CDT 1999hp3210 accepting requests since Fri Dec 18 14:02:33 CST 1998papowers accepting requests since Mon Jul 10 14:42:10 CDT 2000printer hp3147 is idle. enabled since Tue Apr 20 10:19:38 CDT 1999. available.printer hp3210 is idle. enabled since Fri Dec 18 14:02:33 CST 1998. available.printer hp2116 is idle. enabled since Mon Apr 19 15:04:57 CDT 1999. available.printer papowers now printing papowers-67. enabled since Wed Nov 29

Page 294: Solaris Real Stuff

08:08:31 CST 2000. available.papowers-67 cbartley 163135 Nov 28 14:20 on papowers

---------------------------------A `lpstat -p papowers -l' yields:---------------------------------printer papowers now printing papowers-67. enabled since Wed Nov 2908:08:31 CST 2000. available.Form mounted: Content types: simplePrinter types: unknownDescription: Connection: directInterface: /usr/lib/lp/model/net_ljx000After fault: continueUsers allowed:(all)Forms allowed:(none)Banner requiredCharacter sets:(none)Default pitch:Default page size:Default port settings:

Any ideas about how I can fix this?

--Jim Winkle, UNIX System Administrator, UW-Madison, DoIT. Contact info:

BioComp: [email protected] http://www-biocomp.doit.wisc.edu/Other: [email protected] http://jwinkle.doit.wisc.edu/

Problem restatement:---------------------A user printed a file once, but the file continued to come out of theirprinter, consuming much toner and a small forest. Trying to cancel theprint job with `cancel' yielded (after a 30 second delay):could not talk to print service at biocomp

Solution:---------------------# lpshut# rm -r /var/spool/lp/tmp /var/spool/lp/temp /var/spool/lp/requests# /usr/lib/lpsched

Page 295: Solaris Real Stuff

Starting lpsched recreated those directories I deleted.

I'm still getting the error message (and 30 second delay):could not talk to print service at biocompwith cancel and lpq, but this may be a seperate problem. 'biocomp' isthe name of the UNIX box on which the printer is queued (same box as Iam executing cancel/lpq on). If anyone has any ideas, send 'em my way.

Cause of the problem:---------------------Most likely caused by a network hangup, which lasted longer thanthe printer timeout. Here's how it should work. The spooler sends thethe print job to the printer, which gets broken up into tcp packets. Theprinter accepts these packets as it can, handshaking with the server.

If there is REALLY slow network response (as in a network storm), theserver doesn't get the response in time, assumes the printer didn'tget the file, and resends it. The printer itself prints the original,and the retransmission of it. Things can get pretty confused, andthe spooler can end up sending endless copies of part of or a whole file.

Thanks much to respondants!

"Could not talk to print service" errorsAll 2 messages in topic - view as tree

kprasa Sep 7 2000, 6:44 pm show optionsNewsgroups: comp.sys.sun.adminFrom: [email protected] - Find messages by this authorDate: Thu, 07 Sep 2000 22:26:33 GMTLocal: Thurs, Sep 7 2000 6:26 pmSubject: "Could not talk to print service" errorsReply to Author | Forward | Print | Individual Message | Show original| Report Abuse

I am trying to move a print server from one machine to another. On thenew machine, which is running Solaris 8, I get the following error whenI type in lpq "Could not talk to Print Service at <hostname>"lpstat -a shows me that all the printers are accepting requests. I canprint to the printers from workstations around. I cannot seem to getthe queue status./usr/lib/lpsched is running and all printers have been enabled.

What am I missing. Any suggestions would help.

Thanks

Page 296: Solaris Real Stuff

Kiran"Could not ta

Sent via Deja.com http://www.deja.com/Before you buy.

Greg Andrews

<hostname> is probably not accepting connections on TCP port 515,the printing port. Or you have a packet filter that was openedto allow the old print server to connect to the printers, butnot the new one.

-Greg

John Martin

Ah the joys of the Solaris print spooler.

It's been a while since I did this so I might be a bit rusty, here goes.

Shut down the spooler with lpshut

cd to /var/spool/lp/requests/<hostname>

Within this directory will be the request file which indicates to thespooler that there is a request waiting to be serviced. It will be namedwith the request number (67-0 I think). If you delete or move this filesomewhere else then it effectively removes the request from the queue.

cd to /var/spool/lp/tmp/<hostname>

Within this directory will be a temporary file which holds details about theprint job, including the name of the file being printed. It too will benamed with the request number (67-0 I think). Move this file elsewhere incase you need to refer to it later.

Do a ps and grep for lp just to make sure they're all gone.

Restart the spooler and do an lpstat to check the queue entries.

If your spooler makes a copy of the print file before printing it then youmay need to manually remove it, refer to the request details saved above.

As for your original problem, I have seen this sort of behaviour exhibitedon some of our printers in the past. In our case it was down to slow network

Page 297: Solaris Real Stuff

response and the setting of the printer timeout. Basically what happens isthat the spooler sends the request to the printer, if the print file islarge enough then it will wait for a response to say that it has startedprinting and to send more please. Due to slow network response the replydoesn't get through in time so the spooler does a retry thinking that it hasfailed the first time etc etc.

I'm sure that you can up the timeout in the jetdirect interfaces throughjetadmin so try looking there.

Hope this helps.

JJ

Here are the two ways for solaris 2.6 and later, printing to anHP network printer that supports Postscript:

1:

lpadmin -p printername -v /dev/null -m netstandard -o protocol=bsd -o dest=printer-ip-address -T PS -I postscript

enable printernameaccept printername

Note that the last argument to the lpadmin command is -I, which isa capital letter "eye", not a lowercase letter "ell" or the number"one".

If the HP printer doesn't support Postscript, change the last twoarguments in the lpadmin command to:

-T hplaser -I any (if you're going to print PCL formatted files)-T hplaser -I simple (if you're going to print only plain text)

2:

Add an entry to /etc/hosts (or your NIS/NIS+ hosts table, or your DNS)that gives a hostname to the printer's IP address

lpadmin -p printername -v /dev/null -m netstandard -o protocol=bsd -o dest=printer-hostname -T PS -I postscript

enable printernameaccept printername

Page 298: Solaris Real Stuff

See the notes above for variations of the lpadmin arguments. Noticethat only the "-o dest=xxxx" argument changed between the two ways.

With HP printers, it's usually best to download and install HP'sJetadmin software for Solaris. It supports mode of the printer'sfeatures than the stock Solaris print systems does. Then add theprinter through Jetadmin's text interface instead of typing theabove commands.

-Greg--

Fil Krohnengold

I've gotten some help, but I'm still having trouble. here's what I'mdoing:

research:~> lpadmin -p netsyslp2 -m netstandard -v /dev/null -o protocol=bsd -o dest=the_real_ip_addr -T PS -I postscript

I've actually tried this a few times without the -T and -I options.So here's where i get confused:

research:~> enable netsyslp2destination "netsyslp2" now accepting requestsresearch:~> accept netsyslp2UX:accept: WARNING: Destination "netsyslp2" was already acceptingrequests.[Hmm...]research:~> lpstat -p netsyslp2printer netsyslp2 disabled since Sat Nov 6 23:35:27 EST1999. available.new printer

And that's it. The printer stays "disabled". And if I send a job toit, it's stuck in the queue now for ever.

research:~> lpstat -R1 netsyslp-1 fil 4924 Nov 04 15:221 netsyslp1-2 fil 4924 Nov 04 16:10research:~> cancel netsyslp1-2could not talk to print service at researchresearch:~>

Tried restarting lpsched too. Sad.

Page 299: Solaris Real Stuff

Anyone?

-fil

I'm trying to configure a printer spool on a SUN Sparc running SunOS 5.5.1and am receiving the following error when trying 'lpq -P<queuename>':

could not talk to print service at <print server name>

I've tried both local and remote spools and get the same error (only theserver name changes appropriately).

However, this problem doesn't occur for the superuser -- only regularusers get this problem. This seems to indicate that some file permissionsare set incorrectly. I checked for 'r-x' permissions for everybody in thefollowing places:/var/spool/print/etc/lp/etc/lp/model/etc/lp/interfaces

Does anyone know what my problem is and how I can fix it?

Thanks,

Trevor Wood

suggestion; this is solaris. learn the native commands.

lpadmin, lpstat, lpsystem.

try lpstat -o printername

Then if that doesn't work, try usingtruss lpstat -o printername

to find out where it is bombing

-

Sorry that I lose the original thread. To works as a print server,you need to configure /etc/lp/Systems by lpsystem, and saf by sacadmand pmadm. Did you do all these?

The spec on RFC1179 for LPD ensures that any LPD connections must be in therange of 721 - 731 inclusive. If your `lpq' is executed at the user level and

Page 300: Solaris Real Stuff

is an lpq that speaks to a REMOTE LPD, then your connection will be refusedbecause user-level port restrictions.

Most often lpq's talk to the local LPD which is running as root which in turntalks to the remote LPD.

Ensure that your version of lpq does in fact talk to the local LPD, not theremote LPD. Otherwise, you may indeed have to suid your lpq binary. I'veheard of some lpq's just referencing the printcap for the rm,rp entriesinstead of talking to a local LPD. This IMHO is more an error with LPDRFC1179 than anything else.

I don't know if this is it, but try it and find out.

joeh I thought lpd is listening at port 515 ??

-

joeh wrote:> I thought lpd is listening at port 515 ??

> "J. S. Jensen" wrote:> > The spec on RFC1179 for LPD ensures that any LPD connections must be in the> > range of 721 - 731 inclusive. If your `lpq' is executed at the user level and

Yes, the listening port IS at 515, however, the RECEIVING CLIENT port(yes, it isspecified, if you can believe that) must be 721-731.

> > is an lpq that speaks to a REMOTE LPD, then your connection will be refused> > because user-level port restrictions.

Unfortunately this is why a lot of LPD queries fail.

My /etc/services file has this line:

printer 515/tcp spooler # line printer spooler

Do i also need to add the client statement to this file?Just checking before i presumeDoes anyone have a copy of the line in their /etc/services file?Using SunOS 5.7 ("solaris 7")Thanks to anyone willing to helpJoe

Page 301: Solaris Real Stuff

joeh wrote:> My /etc/services file has this line:> printer 515/tcp spooler # line printer spooler> Do i also need to add the client statement to this file?

No, not at all, as you well know, the services file allows forparticular daemons toreference port numbers by names.

The daemon LPD itself checks the connecting client port. Or at leastit /should/according to spec. I've personally written LPDs where I ignore the client portrestriction.

--

>My /etc/services file has this line:

>printer 515/tcp spooler # line printer spooler

>Do i also need to add the client statement to this file?>Just checking before i presume>Does anyone have a copy of the line in their /etc/services file?>Using SunOS 5.7 ("solaris 7")

No, you don't need to add anything else to that file. Both serversand clients only use the first two fields, which are the same forboth.

-Greg

posted by Brahma at 9:46 AM 0 comments

Printer prints, but can't see queue.

Subject: Printer prints, but can't see queue.

Hi,My printer (an Espon Stylus injet) prints from my Sun (Ultra80, Solaris 9)okay, using Ghostscript to convert the postscript to Epson's language. I use theparallel port on the U80.

Something like

Page 302: Solaris Real Stuff

% cat file.ps | lpworks fine and prints okay using Ghostscript and

% cat file.epson | lp -d st850_directwill put the file directly to the printer.

However, lpq will never show what is in the queue - either of them. This used towork, but I noticed it failed to work some months ago.

Can anyone give me a clue where to start looking for the problem ?

Leslie Hayward

What do you get from lpstat -t?

sparrow /export/home/davek % lpstat -tscheduler is runningsystem default destination: st850_pssystem for _default: sparrow (as printer st850_ps)device for st850_ps: /dev/ecpp0device for st850_direct: /dev/ecpp0_default accepting requests since Sat Nov 2 14:36:53 GMT 2002st850_ps accepting requests since Sat Nov 2 14:36:53 GMT 2002st850_direct accepting requests since Sat Nov 2 14:37:29 GMT 2002printer st850_ps is idle. enabled since Sat Nov 2 14:36:53 GMT 2002. available.printer st850_direct is idle. enabled since Sat Nov 2 14:37:29 GMT 2002.available.

But lpq says it can't talk to the print service.

sparrow / # /usr/ucb/lpqcould not talk to print service at sparrow

(I've tried that both as a normal user and as root in the above example).

David, this doesn't answer your question, but you may be able to help me withmine. I see you've got a printer on the parallel port. How did you configure it?All my attempts have ended in printless silence, and reliance on the networkprinter....

I'm not expert on this and are not 100% convinced it is okay now, but it works(minus my ability to read what's in the queue) in the single user environment. Itook some of the information on setting up Ghostscript via the web, from anearlier revision of:http://cfauvcs5.harvard.edu/Se tGSprinter4Solaris.htmlwhich someone at Sun said was wrong, since my filter was printing to the

Page 303: Solaris Real Stuff

parallel port and not to standard output. However, despite this, the queueworked then. Later I changed things to how someone at Sun said it should be, andthey were better, but the queue which is my present problem was never an issueuntil relatively recently (perhaps since I installed Solaris 9).

(Note I did point out to the author of the web page above about this issue withprinting to stdout rather than the port itself. You can see he has made a noteabout it, but finds it more troublesome himself.)

Perhaps I should have another read at the documentation myself, since this maybe why my queue is not working.

I don't know if you have a Postscript printer or not. I don't has a Postscriptprinter, so I'm not sure how helpful the following is, but I've installed twoqueues as you see above. st850_ps goes via Ghostscript, so converts thePostscript output by 99% of Solaris programs to Epson control commands. Theother queue, st850_direct connects directly to the printer, with no processing.st850_direct is set up so I can print via Samba from a PC to the printer on theSun.

1) To set up the direct queue, with no processing.

# ls -l /dev/ecpp0 lrwxrwxrwx 1 root root 50 Sep 14 17:03/dev/ecpp0 -> ../devices/pci@1f,4000/ebus@1/ecpp@14,3043bc:ecpp0

# ls -l /devices/pci@1f,4000/ebus@1/ecpp@14,3043bc:ecpp0crw-rw-rw- 1 root sys 64, 0 Nov 2 20:47/devices/pci@1f,4000/ebus@1/ecpp@14,3043bc:ecpp0

lpadmin -x st850_directlpadmin -p st850_direct -v /dev/ecpp0 -T unknown -I anyenable st850_directaccept st850_direct

2) To set up the queue, printing via Ghostscript I essentially followed theinstuctions athttp://cfauvcs5.harvard.edu/Se tGSprinter4Solaris.html

I've set up a script for this, which I copied below.# 1) remove any printer under the name st850_pslpadmin -x st850_ps

# 2) define filter description /etc/lp/fd/PS_to_EPSON.fd

# 3) Make ownership correct on filterchown lp:lp /etc/lp/fd/PStoEPSON.fd

Page 304: Solaris Real Stuff

# 4) Define the printer name and the port device the printer will us# According to the Answerbook, the device to use is /dev/null./usr/sbin/lpfilter -f PStoEPSON -F /etc/lp/fd/PStoEPSON.fd

# 5) Register the printer#/usr/sbin/lpadmin -p st850_ps -v /dev/null -I ESC2/usr/sbin/lpadmin -p st850_ps -v /dev/ecpp0 -I ESC2

# 6) ENABLE PRINTERaccept st850_psenable st850_ps

# 7) DISABLE THE BANNER AND SET SPEED ON# the parrallel port to 19200 bits/s.lpadmin -p st850_ps -o "stty=19200" -o nobanner

# 8) MAKE IT THE DEFAULT QUEUElpadmin -d st850_ps

# 9) CHECK IT IS OKAYlpstat -t st850_ps

# 10) Make /dev/ecpp0 writable by anyone if neceessary,# which is was in my case.chmod 666 /dev/ecpp0#Now replace nobanner=no to nobanner=yes in /etc/lp/interfaces/st850_ps/usr/local/bin/replace 'nobanner="no"' 'nobanner="yes"'/etc/lp/interfaces/st850_ps

and the filter looks like this:

sparrow /etc/lp # more /etc/lp/fd/PStoEPSON.fdInput types: postscriptOutput types: ESC2Printer types: anyPrinters: anyFilter type: fastOptions: MODES fast = -fOptions: MODES slow = -sOptions: MODES photo = -pOptions: MODES high = -hOptions: MODES normal = -nCommand: /usr/local/bin/ps_to_epson850

Page 305: Solaris Real Stuff

I realised I forgot something from what is in the page athttp://cfauvcs5.harvard.edu/Se tGSprinter4Solaris.htmlI print via a script, that allows me to set the resolution.

sparrow /etc/lp/fd # more ./PStoEPSON.fdInput types: postscriptOutput types: ESC2Printer types: anyPrinters: anyFilter type: fastOptions: MODES fast = -fOptions: MODES slow = -sOptions: MODES photo = -pOptions: MODES high = -hOptions: MODES normal = -nCommand: /usr/local/bin/ps_to_epson850

The command file /usr/local/bin/ps_to_epson850 has in it:

#!/bin/sh# this is /usr/local/bin/ps_to_epson850# default to standard resolutionRES="@stc800p.upp"# select high resolution when requestedif [ "$1" = "high" ]; thenRES="@stc800ih.upp"fi# run gs with print filtering options (writes to stdout)# OK exec /usr/local/bin/gs -q ${RES} -dBufferSpace=250000000 -sPAPERSIZE=a4-sOutputFile=- - -c quitexec /usr/local/bin/psselect -r | /usr/local/bin/gs -q ${RES}-dBufferSpace=250000000 -sPAPERSIZE=a4 -sOutputFile=- - -c quit

Greg Andrews

Sparrow is the name of the machine you typed the lpstat command on,and it has in.lpd disabled in its /etc/inetd.conf file, or you havethe old TCP listeners interfering with inetd binding to the printerport.

To find out if the old TCP listeners are running, type:

pmadm -l -p tcp

To delete them when they're running, type:

Page 306: Solaris Real Stuff

pmadm -r -p tcp -s lpdpmadm -r -p tcp -s lppmadm -r -p tcp -s 0 <-- that's the number zero

-Greg--

The parallel port device depends on which server/workstation you have.How well printing works depends on the make/model printer you have.

Can you supply those two bits of information? The version of Solarisyou're using would also be nice to know...

Thanks a lot Greg, I had indeed commented out:

printer stream tcp6 nowait root /usr/lib/print/in.lpd in.lpd

in /etc/inetd.conf

I tried commenting out what I could for security reasons and obviously went onetoo far. I think I checked before that printing was still working okay aftercommenting the line out. Clearly I did not go as far as to check that one couldsee the queue.

Lpstat could see the queue. It was just /usr/ucb/lpq that couldn't.Lpstat knows it can talk to lpsched via lpsched's FIFO (aka "namedpipe" in the filesystem. /usr/ucb/lpq thinks it must make a networkconnection and talk to in.lpd to get the status.

I would suggest switching from the BSD commands (lpr, lpq, lpc, lprm)to the SVR4 commands (lp, enable/disable/accept/reject, lpstat, cancel).The only SVR4 command that needs lpd to be running is cancel.

-Greg--

posted by Brahma at 9:42 AM 0 comments

Parsing a date field

Parsing a date field.

Subject: Parsing a date field.

Hi,

Page 307: Solaris Real Stuff

I want to be able to extract selected lines from a logfile, where eachline is tagged with today's date.

e.g

Mon Jul 25 13:58:15 2005: Loading file ...Mon Jul 25 13:58:16 2005: Deleting all data from tblCusipMap

Say that today is Mon Jul 25 2005. How can I extract all of today'slines from the file (ignoring the timestamp, but including the year)without getting too messy?

Cheers!

Reply

John L Jul 25, 4:37 am

You need to grep today's date from the file, andyou can get today's date using the date command.

Because you need the date in two sections (the year isseparate) you could run the date command twice but thatwill fail around midnight on New Year's Eve so just runit once and assign the result to shell variables.

set -- `date +'%a %b %d %Y'`grep "$1 $2 $3 [0-9:]* $4" logfile

For ease of future maintenance, consider using meaningfulvariable names.

-- John.

ROops. You should add a ^ to anchor the pattern to the start ofeach line, just to be on the safe side.

set -- `date +'%a %b %d %Y'`grep "^$1 $2 $3 [0-9:]* $4" logfile

-- John.

Page 308: Solaris Real Stuff

Reply

Chris F.A. Johnson

grep "`date "+^%a %b %d [0-9][0-9]:[0-9][0-9]:[0-9][0-9] %Y:"`" LOGFILE

William James

ruby -nae 'BEGIN{$a=Time.now.to_s.+(":").split;$a.slice!(3,4)};print if$a==$F.values_at(0,1,2,4)'

gawk '$1$2$3$5==strftime("%a%b%d%Y:")' file

John W. Krahn Jperl -ne'BEGIN { ( $today = localtime ) =~ s/\d+:\d+:\d+/\\d+:\\d+:\\d+/ }print if /^$today/o' logfile

posted by Brahma at 9:38 AM 0 comments

Friday, July 29, 2005

Panics

Log files Related messages in /var/adm/messages are checked.Sometimes hardware and software failures are prefaced by messageswritten to the log file. If there are disk error messages in the logfile, and UNIX(r) file system (UFS) routines listed on a stack trace, itis likely that there is a disk hardware problem.

Panics Often, system panics can originate from software, however,it is possible to incur a panic from a hardware fault. Some panicmessages that indicate hardware problems include :Asynchronous memory error Indicates a memory problem

Asynchronous memory fault Usually indicates a bus problem betweenmemory and CPU

posted by Brahma at 3:42 PM 0 comments

netstat

Page 309: Solaris Real Stuff

The netstatCommand The netstat command returns the contents of network data structuresand tables, including status of active sockets, interfaces, routingtables, and DHCP (Dynamic Host Configuration Protocol). Some usefuloptions are listed here.

# netstat -i For each interface, this command displays the number ofinput and output packets, errors, collisions, and number of requestsin the queue.From this, the collision rate can be calculated by dividing thenumber of collisions by the number of output packets and multiplyingby 100. Ideally, this value should not exceed 5 to 10 percent.

posted by Brahma at 3:41 PM 0 comments

rocess on your system is consuming the most memory:

7. Enter the following command to determine which process on yoursystem is consuming the most CPU time, and which process on yoursystem is consuming the most memory:

# /usr/ucb/ps -aux

Usually, in X Windows applications, dt , and fsflush are the highestresource users.

posted by Brahma at 3:41 PM 0 comments

nslookup

==============================================================================TOPIC: nslookup doing a reverse lookup of nameserver ??http://groups-beta.google.com/group/comp.unix.solaris/browse_thread/thread/6989d1d50a7227ef==============================================================================

== 1 of 1 ==Date: Tues 26 Jul 2005 13:06From: jms

Solaris8 on SPARC.

I just saw today that nslookup was doing a reverse lookup of the nameserver( an internal machine ). That is, if I do "nslookup www.google.com", it

Page 310: Solaris Real Stuff

tries to get the reverse lookup of 192.168.0.1 ( the nameserver definedin /etc/resolv.conf ). Thus, all query fails.

Funny thing is, it was working before without this problem.

Doing an nslookup or dig on another machine that uses the same nameserverdoes not exhibit this problem.

Note that I have stopped nscd just to make sure.

# nslookup -d2 www.google.com;; res_nmkquery(QUERY, 1.0.168.192.in-addr.arpa, IN, PTR)------------SendRequest(), len 42HEADER:opcode = QUERY, id = 16288, rcode = NOERRORheader flags: query, want recursionquestions = 1, answers = 0, authority records = 0, additional = 0

QUESTIONS:1.0.168.192.in-addr.arpa, type = PTR, class = IN

------------------------Got answer (119 bytes):HEADER:opcode = QUERY, id = 16288, rcode = NXDOMAINheader flags: response, want recursion, recursion avail.questions = 1, answers = 0, authority records = 1, additional = 0

QUESTIONS:1.0.168.192.in-addr.arpa, type = PTR, class = INAUTHORITY RECORDS:-> 168.192.in-addr.arpatype = SOA, class = IN, dlen = 65ttl = 118 (118)origin = prisoner.iana.orgmail addr = hostmaster.root-servers.orgserial = 2002040800refresh = 1800 (30M)retry = 900 (15M)expire = 604800 (1W)minimum ttl = 604800 (1W)

Page 311: Solaris Real Stuff

------------*** Can't find server name for address 192.168.0.1: Non-existent host/domain*** Default servers are not available

# cat /etc/resolv.confnameserver 192.168.0.1

# cat /etc/hosts<.... snip...>192.168.0.1 router dns<.... snip...>

# cat /etc/nsswitch.conf<.... snip...>hosts: files dnsipnodes: filesnetworks: files dns<.... snip...>

posted by Brahma at 3:40 PM 0 comments

pkgadd

TOPIC: pkgadd <pkg> AFTER applying a patch affecting <pkg>http://groups-beta.google.com/group/comp.unix.solaris/browse_thread/thread/3f2de53ed4abd8db==============================================================================

== 1 of 1 ==Date: Tues 26 Jul 2005 09:43From: Casper H.S. Dik

Frank Cusack <[email protected]> writes:

>Say I apply patch 1234-01 which affects 3 packages, 2 of which are>installed. I later install package 3. Is this package now unpatched?>Does the system know this and will it reapply the patch, or will it>think the patch is already applied and that's it.

It will be unpatched; this is a known deficiency in the Solarispackage system which somehow people have been unwilling to fix.

Casper--

Page 312: Solaris Real Stuff

posted by Brahma at 3:40 PM 0 comments

Differences between BSD SunOS 4.1.X and System V Solaris 2.X Mar 1995

---------------------------------------------------------------------------Differences between BSD SunOS 4.1.X and System V Solaris 2.X Mar 1995------------------------------------------------------------

/usr/ucb/lpr FILE <-> /usr/bin/lp FILE/usr/ucb/lpr -P PRINTER FILE <-> /usr/bin/lp -d PRINTER FILE

/usr/ucb/lpq -P PRINTER /usr/bin/lpstat -d PRINTEROR <-> /usr/bin/lpstat -o PRINTER/usr/etc/lpc status PRINTER /usr/bin/lpstat -u USER

/usr/ucb/lprm -P PRINTER JOB# <-> /usr/bin/cancel PRINTER-JOB#/usr/bin/cancel -u USER PRINTER

/usr/etc/lpc disable PRINTER <-> /usr/sbin/reject PRINTER/usr/etc/lpc stop PRINTER <-> /usr/bin/disable PRINTER

/usr/etc/lpc enable PRINTER <-> /usr/sbin/accept PRINTER/usr/etc/lpc start PRINTER <-> /usr/bin/enable PRINTER/usr/etc/lpc restart PRINTER

no BSD equivalent <-> /usr/bin/lpmove PRINTER-JOB#NEW_PRINTER

ps -aux | grep lpd ... kill <-> /etc/init.d/lp stop

/usr/lib/lpd <-> /etc/init.d/lp start

vi /etc/printcap <-> /usr/sbin/lpadmin -x PRINTER- remove unwanted printer

/usr/bin/ps -aux | grep lpd <-> /usr/sbin/lpshut/usr/bin/kill LPD_PIDS

/usr/lib/lpd <-> /usr/lib/lp/lpsched

default printer = $PRINTER <-> default printer = $LPDEST

posted by Brahma at 3:37 PM 0 comments

Page 313: Solaris Real Stuff

Friday, July 22, 2005

MAPPING MAN MANPATH

MAPPING MAN

In Solaris, indexing theman pages:

1. Set your MANPATH variableto include all your man pagesto be indexed:

Eg. #> MANPATH=/usr/man:/usr/dt/man:/opt/VRTSvxvm/man; export MANPATH

2. Use catman command to indexthe man pages:

#> catman -w

Now you can use 'whatis' commandto know what a command does and'man -k' commands to search fora keyword in man pages.

posted by Brahma at 2:56 PM 0 comments

Which disk have I booted from?

Hello

I have just set up mirrored system disks on an E450 system runningSolaris 8, Disksuite 4.2.1.

The server is sited remotely and I don't ( at present ) have remoteconsole access.

I set up an alias for my mirrored disk and rebooted using reboot --boot-mirror # the alias name.

I can't seem to determine from /var/adm/messages which disk I haveactually booted from. Does anyone know a way of doing this?

Much AppreciatedMark

Page 314: Solaris Real Stuff

Reply

Sivakanth Mundru Subject: Re: Which disk have I booted from?

> I have just set up mirrored system disks on an E450 system running> Solaris 8, Disksuite 4.2.1.

> The server is sited remotely and I don't ( at present ) have remote> console access.

> I set up an alias for my mirrored disk and rebooted using reboot --> boot-mirror # the alias name.

> I can't seem to determine from /var/adm/messages which disk I have> actually booted from. Does anyone know a way of doing this?

> Much Appreciated> Mark

If you have set up aliases with eeprom

eeprom | grep alias

Then do a prtconf -pv | grep bootpath

You should be able to compare both the paths and tell which mirror youare booted off of

posted by Brahma at 2:55 PM 0 comments

multiple successive inputs

Michael Tosch Jul 19, 2:57 pm show options

Subject: Re: multiple successive inputs

Edwina Mackay wrote:> Dear all,

> I am not well-versed in shell scripting so this is probably an easy> question. I'm using a homemade C program given to me by a colleague, which> I invoke with some options, like this:

> ./myprogram -n -g 2

Page 315: Solaris Real Stuff

> and the program responds:

> Please enter input set:

> to which I reply something like:

> 0,3,8,12

> and it spews out the required answer.

> I would like to run the program, with the same option set, on many input> sets, without having to sit there and type them in one after the other on> multiple runs of the program. Let's say I create a file of input sets that> I want to run the program on, and this file looks like this:

> 0,3,8,12> 0,3,11,14> 0,2,5,6> 0,2,8,11> 0,2,7,13

> How do I now run the program on the whole list of input sets? I want> something like (pseudocode):

> for each LINE in myfile do> ./myprogram -n -g -2 [enter] LINE [enter] done

> Many thanks for helping me out with this. Best wishes, Edwina.

First try./myprogram -n -g 2 <myfile

If this does not work then run this executable script:

#!/bin/shwhile read line; doecho $line | ./myprogram -n -g -2done <myfile

-- Michael Tosch @ hp : com

ITYM:

Page 316: Solaris Real Stuff

echo "$line" | ./myprogram -n -g -2orprintf "%s\n" "$line" | ./myprogram -n -g -2

I know there's no spaces or special characters in the posted inputsample, but quoting by default is a good habit to get into.

Ed.

posted by Brahma at 2:54 PM 0 comments

exec overlays the running process with the new program - the PID is maintained.

> #!/bin/sh> echo $$ > .pidlog> exec myprogram> Tried this, but $$ is the pid of the script itself and not the program> being run. Subsequent kill on $$ result doesn't stop myprogram.

I just tried this on Solaris 2.6/x86 (sorry, I haven't got access to aSPARC version any longer):

Sun Microsystems Inc. SunOS 5.6 Generic August 1997$ echo $$344$ exec ksh$ echo $$344$ psPID TTY TIME CMD344 pts/0 0:00 ksh$

As you can see, the exec overlays the running process with the newprogram - the PID is maintained. Thus, the solution proposed by Icaruswill work because as long as your "myprogram" isn't itself the result ofa fork, the .pidlog file will contain the PID of the exec'd program.

Are you sure you used "exec"?

posted by Brahma at 2:53 PM 0 comments

How to make DOS look like Unix

Page 317: Solaris Real Stuff

How to make DOS look like UnixITworld.com, Unix in the Enterprise 7/19/2005

Sandra Henry-Stocker, ITworld.com

Ok, well maybe the subject line of this column is stretching it just awee bit. There's no way that you can emulate more than a smallfraction of Unix power at the DOS Command Prompt. But DOS has grownquite a bit since the days when it was an operating system in its ownright. With pipes and redirection, a pile of commands that are roughlyequivalent to their Unix counterparts (dir is to ls what type is tocat) and a little finesse, you can make your time working in DOS seema little less like a detour through the dark ages and a little morelike home.

To begin our little makeover, let's look at the typical DOS commandprompt. When I open a command prompt on my Windows XP Professionallaptop, I see something like this:

C:\Documents and Settings\shenry-stocker\Desktop>

Not so bad, you might be thinking, but any prompt that stretches morethan halfway across MY screen is too much prompt for me. Can I changethis the way I can change my prompt on a Unix system (e.g., settingPS1="\h> ")? Yes, in fact I can. For a minimalistic prompt, I likeusing something like this:

prompt=$G$S

This will change the DOS prompt to "> ". The $G represents thegreater-than and $S is a space.

The default prompt on DOS is usually $P$G -- the current path followedby a > and you can change it back as easily:

prompt=$P$G

Many other variables can be also be used in the prompt string:

$Q = (equal sign)$$ $ (dollar sign)$T Current time$D Current date$P Current drive and path$V Windows version number$N Current drive

Page 318: Solaris Real Stuff

$G > (greater-than sign)$L & (less-than sign)$B | (pipe)$H Backspace (erases previous character)$E Escape code (ASCII code 27)$_ Carriage return and linefeed

Some of these (such as $V) make a perfectly hideous and worthlessprompt, but others can be advantageous at times -- such as $T$G$S whenyou're watching the clock on a Friday afternoon.

Defining Macros

To clear the screen in DOS, use the cls command. It works just likeclear on Unix. And, if you don't like translating between Unix and DOScommands just to clear your screen, you can turn clear into a macro.Macros work like aliases in the Unix shells. To make clear equivalentto cls, for example, you would do this:

doskey clear=cls

Once you type this at the command prompt, you can use "clear" to clearyour screen instead of cls -- at least until you open a new DOScommand prompt.

So now let's tackle some other annoying near-equivalents. I often findmyself typing "ls" at the command prompt when I meant to type "dir".It runs, but it takes a while to get started. If I create a macro tomake ls equivalent to dir, on the other hand, the response is quickand I don't feel like such a klutz for not being able to keep myoperating systems straight.

Another macro that I find useful is "date=date /t". Every time I type"date" at the DOS command prompt, my intention is to find out what dayit is -- NOT to reset the date. By adding the /t, I avoid beingprompted for a new date and having to control-C my way out of it. Forthe same reason, I like to add a macro that makes "time" invoke the"time /t" command.

Though DOS lacks a pwd command, it is possible to trick it intodisplaying the current directory by typing the "cd" command withoutparameters. Unlike its Unix counterpart, the cd command in DOS doesnot return you to your home directory when you type it withoutparameters, but it does do one useful thing -- it displays the currentpath. With a "pwd=cd" macro, I can change my prompt to one that

Page 319: Solaris Real Stuff

doesn't irritate me and still find out where I happen to be within thefile system.

> pwd C:\Documents and Settings\shenry-stocker\My Documents

Severe Limitations

Before we get too carried away with our goal of making DOS moreUnix-like, I have to mention the biggest drawback to DOS macros --they don't accept parameters. For example, you can try to make "man"equivalent to "help" and it will work, more or less, but typing "mandir" will not give you the same result as typing "help dir". Instead,it will give you the same result as typing "help" by itself.

Making Your Macros Reusable

Once you have defined a set of macros to make it a little easier towork at the DOS command prompt, you'll undoubtedly want to save themso that you can use them every time you open up a DOS command prompt.The first step in this process is to save your macros in a file. Youcan do this with this doskey /m command.

doskey /m > macros.txt

The doskey /m command lists your macros. The remainder of thiscommand, as you'd expect, redirects that listing to a text file. Toinvoke your saved macros at a later time, you can issue this command:

doskey /macrofile=macros.txt

Of course, your version of this command might actually turn out to bemore like this:

doskey /macrofile=C:\Documents and Settings\jdoe\macros.txt

Don't forget all those backward slashes!

Of course, even this one command can be something of an annoyance ifyou need to type it every time you open up a DOS command prompt. Toreduce my level of annoyance, I put both my preferred prompt and thedoskey command to invoke my macros in a batch file which I callcustomize.bat:

> type customize.bat@echo off:: set up macros and custom prompt

Page 320: Solaris Real Stuff

doskey /macrofile=C:\batfiles\macros.txtprompt $G$S/

I then put this customize.bat file somewhere on my search path (Iactually added C:\batfiles to my search path). Now, anytime I open aDOS command prompt, I type "customize". Immediately, my prompt changesand my macros take effect.

If I make any changes to my macros.txt file (which I edit using theedit command and use control-F to access the edit command's menus), Ihave to type "customize" again. This is like sourcing a dot file andfeels normal enough, so it doesn't bother me.

Here's a list of the macros I've set up so far:

> doskey /mtime=time /tdate=date /tll=dirls=dir /wpwd=cdc=clsclear=clsman=helpifconfig=ipconfigalias=doskey /m

No earth shakers in the list, but each command makes me a little morecomfortable working in DOS.

posted by Brahma at 2:52 PM 0 comments

How to debug a very high workload

How to debug a very high workload ?

While the system is in the state to use >90% kernel cpu time, run thelockstat command to produce a basic kernel profile, e.g. while a"sleep 15" command is runing:

# lockstat -kIW -D 20 sleep 15

What functions are listed as top kernel cpu time consumers?

I have reproduced the problem very carefully. The load starts toincrease whenever I remove some very big files. For example; 5 files

Page 321: Solaris Real Stuff

which take up approx. 24Gb together. This happens on the disks which arenot part of the Raid1. Whenever this happens these are the loadconsumers:

1882 83% 83% 0.00 2713647 cpu[0] logmap_cancel203 9% 92% 0.00 4762898 cpu[0] fakesoftint_return22 1% 93% 0.00 5909501 cpu[0] cpu_halt

I've already tried to look into logmap_cancel but unfortunatly SunSolveseems to be down (I can't login "500 - internal server error") andgoogle doesn't help me much here either, so I hope this makes a littlemore sense to you.

Lion-O <[email protected]> writes:>> # lockstat -kIW -D 20 sleep 15

>> What functions are listed as top kernel cpu time consumers?

> I have reproduced the problem very carefully. The load starts to> increase whenever I remove some very big files. For example; 5 files> which take up approx. 24Gb together. This happens on the disks which are> not part of the Raid1. Whenever this happens these are the load> consumers:

> 1882 83% 83% 0.00 2713647 cpu[0] logmap_cancel> 203 9% 92% 0.00 4762898 cpu[0] fakesoftint_return> 22 1% 93% 0.00 5909501 cpu[0] cpu_halt

Yes, I can reproduce it on S10 x86 GA (but not on S10 sparc GA), witha standard UFS logging filesystem (40G total, 18G free space), whendeleting three 5GB files:

% mkfile 5G a b c% sync% lockfs -f .

System is idle, no disk activity at this point. Then:

% rm a b c%

After "rm" has exited, kernel cpu time usage rises to > 90% and alockstat kernel profile shows that logmap_cancel() is responsible forthe kernel's cpu time usage. The kernel is in this state for ~ 90seconds.

Page 322: Solaris Real Stuff

# lockstat -kIW -D 20 sleep 20

Profiling interrupt: 1972 events in 20.241 seconds (97 events/sec)

Count indv cuml rcnt nsec Hottest CPU+PIL Caller -------------------------------------------------------------------------------1679 85% 85% 0.00 246716 cpu[0] logmap_cancel 132 7% 92% 0.00 239730 cpu[0] cpu_halt 35 2% 94% 0.00 230008 cpu[0] (usermode) 18 1% 95% 0.00 325389 cpu[0] mutex_enter 12 1% 95% 0.00 241294 cpu[0] free 10 1% 96% 0.00 243065 cpu[0] logmap_cancel_delta 7 0% 96% 0.00 212141 cpu[0]+11 fakesoftint_return 6 0% 96% 0.00 260988 cpu[0] bzero 5 0% 97% 0.00 243710 cpu[0] rm_assize 4 0% 97% 0.00 343741 cpu[0] deltamap_add 4 0% 97% 0.00 115899 cpu[0] indirtrunc 4 0% 97% 0.00 136087 cpu[0] tsd_agent_get 4 0% 97% 0.00 191666 cpu[0] hash2ints 4 0% 98% 0.00 138739 cpu[0] bread_common 3 0% 98% 0.00 433313 cpu[0] logmap_free_cancel 3 0% 98% 0.00 326170 cpu[0] kcopy 2 0% 98% 0.00 110654 cpu[0] poll_common 2 0% 98% 0.00 187142 cpu[0] brelse 2 0% 98% 0.00 307884 cpu[0] bdwrite 2 0% 98% 0.00 61904 cpu[0]+11 lock_set_spl -------------------------------------------------------------------------------

posted by Brahma at 2:52 PM 0 comments

Friday, July 15, 2005

NetBackup Error codes

Hi!

Status Code: 50client process abortedThe client backup aborted. One instance when this code appears is if aNetBackup master or media server is shut down or rebooted when a backupor restore is in process.

Try the following:

1. Enable detailed debug logging:* Create a bpbkar debug log directory (UNIX or Windows only).

Page 323: Solaris Real Stuff

* Create a bpcd debug log directory (this log is created automaticallyon Macintosh clients.)* On UNIX clients, add the VERBOSE option to the/usr/openv/netbackup/bp.conf file.* On PC clients, increase the debug or log level as explained in thedebug log topics in Chapter 3 of the Troubleshooting Guide.2. Retry the operation and examine the resulting logs.3. On UNIX clients, check for core files in the / directory.4. On UNIX clients, check the system log (/usr/adm/messages on Solaris)for system problems.5. This problem can sometimes be due to a corrupt binary.On UNIX clients, use the UNIX sum command to check the bpcd, bpbkar,and tar binaries, located in /usr/openv/netbackup/bin on the client.Reinstall them if they are not the same as in the client directoryunder /usr/openv/netbackup/client on the server.Run the NetBackup Configuration Validation Utility (NCVU) for theassociated NetBackup clients. Note the client software checks insection two.On a Windows client, check the bpinetd.exe, bpcd.exe, bpbkar32.exe, andtar32.exe executables located in the install_pathNetBackupbin folder onthe client. Reinstall the client if these executables are not the samesize as on other Windows clients or are not at the same release levelor do not have the same NetBackup patches applied as other Windowsclients.

Status Code: 57client connection refusedThe client refused a connection on the port number for bpcd. This canoccur because there is no process listening on the bpcd port or thereare more connections to the bpcd port than the network subsystem canhandle with the listen() call.

Try the following:

1. For Windows NetBackup servers:a. Make sure the NetBackup client software is installed.b. Verify that the bpcd and bprd port numbers in the%SystemRoot%system32driversetcservices file on the server matches thesetting on the client.c. Verify that the NetBackup Client Service Port number and NetBackupRequest Service Port number on the Network tab in the NetBackup ClientProperties dialog match the bpcd and bprd settings in the servicesfile. To display this dialog, start the Backup, Archive, and Restoreinterface on the server and click NetBackup Client Properties on theFile menu.The values on the Network tab are written to the services file when the

Page 324: Solaris Real Stuff

NetBackup Client service starts.d. Verify that the NetBackup client service is running.e. Use the following command to see if the master server returnscorrect information for the client:install_pathVERITASNetBackupbinbpclntcmd -pn2. For UNIX servers:a. Make sure the NetBackup client software is installed.b. Verify that the bpcd port number on the server (either NIS servicesmap or in /etc/services) matches the number in the client's servicesfile.c. Run the NetBackup Configuration Validation Utility (NCVU) for theassociated NetBackup nodes. Note the NetBackup services port numberchecks in section one.3. For a Macintosh or NetWare target client, verify that the server isnot trying to connect when a backup or restore is already in progresson the client. These clients can handle only one NetBackup job at atime.4. Perform "Resolving Network Communication Problems" in theTroubleshooting Guide.

May I know what version of Netbackup you are using? You might also wantto check if the /kernel/drv/st.conf and /kernel/drv/sd.confoverwritten. Thanks.

posted by Brahma at 3:41 PM 0 comments

Summary: System hardware migration best practices

Subject: Summary: System hardware migration best practicesTo: <Thanks for responses from 3 people.

Amanda Wynn - Referred a company called Solarcom.---------------------------------------------------------Ken Rossman suggested the following:

First, are you able to completely clone the old systems over tototallynew duplicate (or similar) hardware? If so, the job should be a goodbit easier.

- If you are not able to completely clone over everything, what pieces(e.g. storage?) have to stay the same and simply have theirconnectionsmoved over (or however you are doing it)?

Page 325: Solaris Real Stuff

- Is the HDS and EMC storage all SAN-connected? (or at least fiberchannelconnected in any case)?

> Brief outline i was thinking is: (oracle DB and HDS/EMC storage)>> = Build new system with a different name and copy config files. Use> sys-unconfig on old, systems during cutover.

Sounds more or less OK, though there are still some bits and piecesof information I am missing here...

> = For data file systems from SAN, perform veritas level deport and> import.

Might be the best route, but again, not sure I have enough info here.

In general, you'll want to separate out the concepts of boot environmentdisks and data disks (but you probably already knew that). Build theboot environment disks fresh, from scratch, and copy over various otherspecific configuration files you'll need. Then, work separately withthe data disks once you know the new platform is stable and close toready to go.

If you have completely cloned hardware (including storage, which Isuspect is NOT the case), you could use one of the data syncrhonizerproducts from either EMC or Hitachi to "sync" the two data storagepools while still live. That's what they are good for. I don'tremember either name of either product right now, however.----------------------------------------------------------------------------

[email protected] 's provided some ideas as following:

You could do flarcreate on old system if SOl8 or higher, then use to installnew systems.

Alternatively, fresh install,find / /usr /var /opt /usr/local -mount|sort -u > /tmp/installedcut -c 1-80 /var/sadm/install/contents|sed 's/=/ /g'|awk '{ print$1 }'|sort -u > /tmp/reg

On old system:sdiff -w 160 -s /tmp/installed /tmp/reg should give GOOD pointers as towhat's missing on new system, since this shows the additional files on OS

Page 326: Solaris Real Stuff

Then:look at files in /etc for customised files:shadow, passwd, group, inet/*, inet/inetd.conf, /etc/default/*

On the disk side, gr8 if both are on SAN at the same time. Check version ofVRTS though, and patches.

on old sys:umount all FS'es after stopping activityvxvol -g your_dg stopallvxdg deport your_dg

On new sys:vxdg import your_dgvxvol -g your_dg startallnow mount. Name still same in /etc/vfstab(Dogrep "your_dg" /etc/vfstab > /tmp/vfsaddon old sys )

Trx to new sys and docat /tmp/vfsadd >> /etc/vfstab

Mount manually one by one to test:mount /mnt1mount /mnt2... till end. This will verify that volas are working. Take longer thanmountall, but saves more time if something is wrong :-)

posted by Brahma at 3:40 PM 0 comments

wget

% wget 'http://sunsolve.sun.com/pub-cgi/pdownload.pl?target=patchdiag.xref'wget 'http://sunsolve.sun.com/pub-cgi/pdownload.pl?target=patchdiag.xref'--21:20:36-- http://sunsolve.sun.com/pub-cgi/pdownload.pl?target=patchdiag.xref=> `pdownload.pl?target=patchdiag.xref'Resolving sunsolve.sun.com... 192.18.108.40Connecting to sunsolve.sun.com[192.18.108.40]:80... connected.HTTP request sent, awaiting response... 302 FoundLocation: http://patches.sun.com/reports/patchdiag.xref [following]--21:20:37-- http://patches.sun.com/reports/patchdiag.xref=> `patchdiag.xref.2'Resolving patches.sun.com... 192.18.108.60Connecting to patches.sun.com[192.18.108.60]:80... connected.

Page 327: Solaris Real Stuff

HTTP request sent, awaiting response... 200 OKLength: 1,552,752 [text/plain]

100%[==========================================================================>]1,552,752 161.75K/s ETA 00:00

21:20:47 (151.75 KB/s) - `patchdiag.xref.2' saved [1552752/1552752]

Same works for patches themselves. You want something more?

posted by Brahma at 3:39 PM 0 comments

check process is running .sh

You think that's all there is to it? Come on, users are much clevererthan that! How about one of these tests next? Could save you timelater explaining why the users latest work has gone missing.Example file exists

if [ ! -f $infile ]thenecho "Input file [$infile] not found - Aborting"exitfi

Note the use of the test not flag (!) in front of the -f (file exists)flag. Or how about:Example verify

if [ -f $outfile ]thenecho "Output file [$outfile] already exists"/usr/5bin/echo "Okay to overwrite? ( y/n ) : \c"read answerif [ "$answer" = "n" ] || [ "$answer" = "N" ]thenecho "Aborting"exitfifi

Well, it might work. But there is a failing with this second example.The test of the user answer tests for a "NO!" response. If the user

Page 328: Solaris Real Stuff

misses the key, it will carry on. Far better to test for a "YES!"response which if missed just fails safe. Revise the test line to:

if [ "£answer" != "y" ] && [ "$answer" != "Y" ]

Note the use of the logical and (&&) instead of the logical or (||) here.

[vsequei@sequeira chapter10]$ more check_process.sh#!/bin/bash

process="/usr/sbin/httpd"start="service httpd restart"

ps ax | awk '{print $5}' | grep -q "^$process$" || {# Apparently not running, so start the processeval "$start"exit $?}

exit 0

[vsequei@sequeira chapter10]$ more check_port.sh#!/bin/bash

port="80"restart="service httpd restart"

netstat -ln | awk '/^tcp/ {print $4}' | grep -q ":$port$" || {# Apparently not listening, so run restart commandeval "$restart"exit $?}

exit 0[vsequei@sequeira chapter10]$

just found mysqld had died for the second time in a week, nothingobvious, so had a search for a good script to check and restart ifnecessary. I'm clueless when it comes to shell scripting, and even ifI was I'd still search - it's the kind of thing where it would be easyto miss something obvious. I was suprised to find…nothing half-useful(is it so obvious?), so ended up having a go anyway.Plan B: use the blog, Luke. Below is what I've got (as/etc/cron.hourly/mysql.checker), if anyone has something better,please drop in a comment ;-)

Page 329: Solaris Real Stuff

#!/bin/shif ps -A | fgrep mysqld > /dev/null; thenecho mysqld is runningelseecho mysqld is NOT running/etc/init.d/mysql startfi

The script /etc/init.d/mysql I think is mysql.server which comes withthe MySQL source, I may have tweaked it.

# Luke Gorrie Says:February 15th, 2005 at 17:22

In Erlang we use a "supervisor" process that starts the worker. Thesupervisor's job is to immediately restart the worker if it terminatesabnormally. The suprevisor should get an active notification about thechild terminating (like Unix SIGCHLD) instead of polling or beinginvoked manually.

Here's a simple Erlang-style supervisor in bourne shell:

#!/bin/sh"$@" || exec "$0" "$@"

NB: Like an Erlang supervisor this is properly tail-recursive. :-)

posted by Brahma at 3:39 PM 0 comments

Moving a disk from SPARC to x86

Subject: SUMMARY: Moving a disk from SPARC to x86

Thanks to: Rich Teer, Carsten B. Knudsen, Bernd Schemmer, Tim Chipman,Darren Dunham, Jason LeDuc, [email protected].

There is no practical way to move the disk with the data intact.

Rich and Bernd summed it up most succinctly:

Rich: "You'll have to do a dump and restore, because the on-disk UFS formatis not compatibile between SPARC (big endian) and x86 (little endian).The disk itself should be OK, but you won't see the data."

Bernd: "IMHO that is not possible due to the different architectures of x86and

Page 330: Solaris Real Stuff

SPARC (little-endian versus big-endian) You must backup the data,repartition the disk and restore the data."

Dunham points out that: "The anticipated release of ZFS should change thisin the future (will be cross-architecture compatible), but doesn't help youtoday."

Thanks to all for your help.

Frank

posted by Brahma at 3:38 PM 0 comments

ftpget: Downloading files via ftp

Example A-13. ftpget: Downloading files via ftp

#! /bin/sh # $Id: ftpget,v 1.2 91/05/07 21:15:43 moraes Exp $ # Script to perform batch anonymous ftp. Essentially converts a list of# of command line arguments into input to ftp.# ==> This script is nothing but a shell wrapper around "ftp" . . .# Simple, and quick - written as a companion to ftplist # -h specifies the remote host (default prep.ai.mit.edu) # -d specifies the remote directory to cd to - you can provide a sequence # of -d options - they will be cd'ed to in turn. If the paths are relative, # make sure you get the sequence right. Be careful with relative paths - # there are far too many symlinks nowadays. # (default is the ftp login directory)# -v turns on the verbose option of ftp, and shows all responses from the # ftp server. # -f remotefile[:localfile] gets the remote file into localfile # -m pattern does an mget with the specified pattern. Remember to quote # shell characters. # -c does a local cd to the specified directory# For example, # ftpget -h expo.lcs.mit.edu -d contrib -f xplaces.shar:xplaces.sh # -d ../pub/R3/fixes -c ~/fixes -m 'fix*' # will get xplaces.shar from ~ftp/contrib on expo.lcs.mit.edu, and put it in# xplaces.sh in the current working directory, and get all fixes from# ~ftp/pub/R3/fixes and put them in the ~/fixes directory. # Obviously, the sequence of the options is important, since the equivalent# commands are executed by ftp in corresponding order## Mark Moraes <[email protected]>, Feb 1, 1989 #

Page 331: Solaris Real Stuff

# ==> These comments added by author of this document.

# PATH=/local/bin:/usr/ucb:/usr/bin:/bin# export PATH# ==> Above 2 lines from original script probably superfluous.

E_BADARGS=65

TMPFILE=/tmp/ftp.$$# ==> Creates temp file, using process id of script ($$)# ==> to construct filename.

SITE=`domainname`.toronto.edu# ==> 'domainname' similar to 'hostname'# ==> May rewrite this to parameterize this for general use.

usage="Usage: $0 [-h remotehost] [-d remotedirectory]... [-fremfile:localfile]... [-c localdirectory] [-m filepattern] [-v]"ftpflags="-i -n"verbflag=set -f # So we can use globbing in -mset x `getopt vh:d:c:m:f: $*`if [ $? != 0 ]; thenecho $usageexit $E_BADARGSfishifttrap 'rm -f ${TMPFILE} ; exit' 0 1 2 3 15# ==> Delete tempfile in case of abnormal exit from script.echo "user anonymous ${USER-gnu}@${SITE} > ${TMPFILE}"# ==> Added quotes (recommended in complex echoes).echo binary >> ${TMPFILE}for i in $* # ==> Parse command line args.docase $i in-v) verbflag=-v; echo hash >> ${TMPFILE}; shift;;-h) remhost=$2; shift 2;;-d) echo cd $2 >> ${TMPFILE}; if [ x${verbflag} != x ]; thenecho pwd >> ${TMPFILE};fi;shift 2;;-c) echo lcd $2 >> ${TMPFILE}; shift 2;;-m) echo mget "$2" >> ${TMPFILE}; shift 2;;-f) f1=`expr "$2" : "\([^:]*\).*"`; f2=`expr "$2" : "[^:]*:\(.*\)"`;echo get ${f1} ${f2} >> ${TMPFILE}; shift 2;;

Page 332: Solaris Real Stuff

--) shift; break;;esac# ==> 'lcd' and 'mget' are ftp commands. See "man ftp" . . .doneif [ $# -ne 0 ]; thenecho $usageexit $E_BADARGS# ==> Changed from "exit 2" to conform with style standard.fiif [ x${verbflag} != x ]; thenftpflags="${ftpflags} -v"fiif [ x${remhost} = x ]; thenremhost=prep.ai.mit.edu# ==> Change to match appropriate ftp site.fiecho quit >> ${TMPFILE}# ==> All commands saved in tempfile.

ftp ${ftpflags} ${remhost} < ${TMPFILE}# ==> Now, tempfile batch processed by ftp.

rm -f ${TMPFILE}# ==> Finally, tempfile deleted (you may wish to copy it to a logfile).

# ==> Exercises:# ==> ---------# ==> 1) Add error checking.# ==> 2) Add bells & whistles.

posted by Brahma at 3:37 PM 0 comments

Generating random 8-character passwords

Example A-14. password: Generating random 8-character passwords

#!/bin/bash# May need to be invoked with #!/bin/bash2 on older machines.## Random password generator for Bash 2.x by Antek Sawicki <[email protected]>,# who generously gave permission to the document author to use it here.## ==> Comments added by document author ==>

MATRIX="0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"

Page 333: Solaris Real Stuff

# ==> Password will consist of alphanumeric characters.LENGTH="8"# ==> May change 'LENGTH' for longer password.

while [ "${n:=1}" -le "$LENGTH" ]# ==> Recall that := is "default substitution" operator.# ==> So, if 'n' has not been initialized, set it to 1.doPASS="$PASS${MATRIX:$(($RANDOM%${#MATRIX})):1}"# ==> Very clever, but tricky.

# ==> Starting from the innermost nesting...# ==> ${#MATRIX} returns length of array MATRIX.

# ==> $RANDOM%${#MATRIX} returns random number between 1# ==> and [length of MATRIX] - 1.

# ==> ${MATRIX:$(($RANDOM%${#MATRIX})):1}# ==> returns expansion of MATRIX at random position, by length 1. # ==> See {var:pos:len} parameter substitution in Chapter 9.# ==> and the associated examples.

# ==> PASS=... simply pastes this result onto previous PASS (concatenation).

# ==> To visualize this more clearly, uncomment the following line# echo "$PASS"# ==> to see PASS being built up,# ==> one character at a time, each iteration of the loop.

let n+=1# ==> Increment 'n' for next pass.done

echo "$PASS" # ==> Or, redirect to a file, as desired.

exit 0

posted by Brahma at 3:37 PM 0 comments

Killing processes by name

Example 12-7. Killing processes by name

#!/bin/bash# kill-byname.sh: Killing processes by name.# Compare this script with kill-process.sh.

Page 334: Solaris Real Stuff

# For instance,#+ try "./kill-byname.sh xterm" --#+ and watch all the xterms on your desktop disappear.

# Warning:# -------# This is a fairly dangerous script.# Running it carelessly (especially as root)#+ can cause data loss and other undesirable effects.

E_BADARGS=66

if test -z "$1" # No command line arg supplied?thenecho "Usage: `basename $0` Process(es)_to_kill"exit $E_BADARGSfi

PROCESS_NAME="$1"ps ax | grep "$PROCESS_NAME" | awk '{print $1}' | xargs -i kill {} 2&>/dev/null# ^^ ^^

# -----------------------------------------------------------# Notes:# -i is the "replace strings" option to xargs.# The curly brackets are the placeholder for the replacement.# 2&>/dev/null suppresses unwanted error messages.# -----------------------------------------------------------

exit $?

posted by Brahma at 3:36 PM 0 comments

debugging

If you are only interested in debugging rather than logging, thenyou might look at setting PS4 to something along the lines ofPROGNAME=$(basename $0)PS4="TRACE \$PROGNAME \$FUNCNAME \$LINENO: "(built-in shell variables may vary) and you can add some timinginformation if you really care. PS4 is printed when tracing withset -x or -v.

posted by Brahma at 3:26 PM 0 comments

customize my bash prompt

Page 335: Solaris Real Stuff

hey, I'd like to customize my bash prompt a bit...

currently I have:PS1="-\h:\w- \u$ "

the problem I have is sometimes I get pretty deep into my directorystructure, and some of my directory names are pretty long. Is theresome way to change the \w in my prompt to:1) if the directory structure is less than 30 characters, just displayit2) if the directory structure is more than 30 characters, change it todisplay three dots (like an ellipsis) and the last 27 characters of thepathfor instance:/some_really_long_path/going/to/some_place gets changed to...ng_path/going/to/some_place

so the prompt would look like:-hostname:...ng_path/going/to/some_place- username$

thanks in advance...

Reply

On 2005-07-12, [email protected] wrote:

> hey, I'd like to customize my bash prompt a bit...

> currently I have:> PS1="-\h:\w- \u$ "

> the problem I have is sometimes I get pretty deep into my directory> structure, and some of my directory names are pretty long. Is there> some way to change the \w in my prompt to:> 1) if the directory structure is less than 30 characters, just display> it> 2) if the directory structure is more than 30 characters, change it to> display three dots (like an ellipsis) and the last 27 characters of the> path> for instance:> /some_really_long_path/going/to/some_place gets changed to> ...ng_path/going/to/some_place

> so the prompt would look like:> -hostname:...ng_path/going/to/some_place- username$

Page 336: Solaris Real Stuff

PROMPT_COMMAND='if [ ${#PWD} -gt 30 ]then PS1="-\h:...${PWD: -27}\u$ "else PS1="-\h:\w- \u$ "fi'

--

posted by Brahma at 3:25 PM 0 comments

Thursday, June 30, 2005

sshd Configuration in Solaris 8

Re: sshd Configuration in Solaris 8Posted By Rajendra Yadav On Tuesday, June 28, 2005 at 7:48 AM

hido the following

#mkdir /var/empty#chown root:sys /var/empty#chmod 755 /var/empty#groupadd sshd#useradd -g sshd -c "SSHD Admin" -d /var/empty -s/bin/false sshd#ssh-keygen -t rsa1 -f /usr/local/etc/ssh_host_key -N""#ssh-keygen -t dsa -f /usr/local/etc/ssh_dsa_key -N ""#ssh-keygen -t rsa -f /usr/local/etc/ssh_host_rsa_key-N ""

then y start the sshd services

RegardRajendra

--- sagar via solaris-l<email@removed> wrote:

> # Calculating the Cost of Tape and Disk Storage> Solutions> # Learn more at:> http://www.ITtoolbox.com/r/rd.asp?i=10697>> # View Group Archive:> http://ITtoolbox.com/hrd.asp?i=830

Page 337: Solaris Real Stuff

>> Hi Guys> I am new to sshd configuration .> I have installed OpenSsh in Solaris8.>> I have a few Queries>> What is prngd ? Why is used ..> Also when I start the sshd daemon I am Getting the> following message> Privilege separation user sshd does not exist>> What is the Privilege Separation User>> Please assist>> Regards>> Sagar>

prngd is a random number generator, only required if your kernel doesnot have a /dev/random and /dev/urandom. There are kernel patches forSolaris that give you those devices. You might already have them.

The sshd privilege separation user is used by the daemon to lower it'sprivileges, so it doesn't have to run as root all the time, before auser has authenticated (when it runs as that user). It increasessecurity and makes it harder for someone to obtain remote root accessin the event that a buffer overflow is discovered/exploited in sshd.

posted by Brahma at 1:03 PM 0 comments

verbose boot

Ever wonder what the box is doing before the banner and OBP prompt is displayed?

Ever wonder what Solaris is doing before the banner and OK prompt isdisplayed? You must have console to do this, simply type "v" forverbose.

Solaris will output its diagnostic report to the console rather thanhiding it. This does not increase boot time.

posted by Brahma at 1:03 PM 0 comments

Page 338: Solaris Real Stuff

Wednesday, June 29, 2005

coreadm gcore

I have a SUID daemon, "dspam", that I fire up as root, and itsuid's to user "dspam". It segfaults after some time, and Ineed to get a core file (but cannot get it to leave one).

I've triedulimit -c unlimitedas root before I fire up the daemon. I have triedcoreadm -e global -e process -e global-setid -e proc-setidThe dspam user owns the directory where the daemon residesand from where I fire it up.

When it segfaults, no core file is left.When I run gcore on it as user dspam, I getgcore: cannot grab 18158: permission denied

I'm using Solaris 9 (5.9 Generic_117171-17).

More debuuging info, in case it helps:

root@rita-blanca:/use/site/dspam-3.4.8/src> ulimit -acore file size (blocks) unlimiteddata seg size (kbytes) unlimitedfile size (blocks) unlimitedopen files 1024pipe size (512 bytes) 10stack size (kbytes) 8192cpu time (seconds) unlimitedmax user processes 9845virtual memory (kbytes) unlimited

root@rita-blanca:/usr/site/dspam-3.4.8/src> coreadmglobal core file pattern:init core file pattern: coreglobal core dumps: enabledper-process core dumps: enabledglobal setid core dumps: enabledper-process setid core dumps: enabledglobal core dump logging: enabled

root@rita-blanca:/usr/site/dspam-3.4.8/src> ls -lF dspam-r-sr-sr-x 1 dspam antispam 1560016 Jun 22 14:11 dspam*

Page 339: Solaris Real Stuff

root@rita-blanca:/usr/site/dspam-3.4.8/src> ./dspam --daemon&[1] 18158

root@rita-blanca:/usr/site/dspam-3.4.8/src> ps -ef | grep -- --daemondspam 18158 2177 0 08:09:59 pts/39 0:00 ./dspam --daemon

root@rita-blanca:/usr/site/dspam-3.4.8/src> su dspam

dspam@rita-blanca:/usr/site/dspam-3.4.8/src> gcore 18158gcore: cannot grab 18158: permission denied

Any help is greatly appreciated.

posted by Brahma at 10:06 AM 0 comments

Ufsdump Tape script

> I need a simple script to dump a server mirrored with solstice disksuiteto> tape, then demonstrate how to recover it to a new disk.> > I'm dumping using this:> > ufsdump 0uf /dev/rmt/0hn /> ufsdump 0uf /dev/rmt/0hn /var> ufsdump 0uf /dev/rmt/0hn /opt> ufsdump 0uf /dev/rmt/0h /usr> > The tape successfully rewinds at the end.> > Then I tried this script to see what's on the tape:> > ufsrestore tvf /dev/rmt/0hn > /root.txt> ufsrestore tvsf /dev/rmt/0hn 2 > /var.txt> ufsrestore tvsf /dev/rmt/0hn 3 > /opt.txt> ufsrestore tvsf /dev/rmt/0h 4 > /usr.txt> > The script crashed saying:> > # ./fullrest.sh> ioctl MTFSF: I/O error> Mount volume 2> then enter volume name (default: /dev/rmt/0hn) ^Cufsrestore interrupted,> continue? [yn] n> > The root.txt file contains the contents of /var

Page 340: Solaris Real Stuff

> > so obviously something's wrong.> > I've tried fiddling with mt commands to no effect. I've also triedleaving> off the "s" parameter and just trusting the order on the tape.> > What am I doing wrong?

What I'm doing wrong is making things too complicated and using ufsrestoreincorrectly.

> ufsrestore tvsf /dev/rmt/0h 4 > /usr.txt

Should read:

ufsrestore tvfs /dev/rmt/0h 4 > /usr.txt

...but that's still too complicated. I broke it into two scripts.

$ more fullback.sh#!/bin/shecho "rewinding tape"mt -f /dev/rmt/0 rewindecho "backing up root"ufsdump 0uf /dev/rmt/0hn /echo "backing up var"ufsdump 0uf /dev/rmt/0hn /varecho "backing up opt"ufsdump 0uf /dev/rmt/0hn /optecho "backing up usr"ufsdump 0uf /dev/rmt/0hn /usrecho "rewinding tape"mt -f /dev/rmt/0 rewind

$ more fulltest.sh#!/bin/shecho "moving tape to beginning"mt -f /dev/rmt/0 rewindecho "testing restoring root"ufsrestore tvf /dev/rmt/0hn > /root.txtecho "testing restoring var"ufsrestore tvf /dev/rmt/0hn > /var.txtecho "testing restoring opt"ufsrestore tvf /dev/rmt/0hn > /opt.txtecho "testing restoring usr

Page 341: Solaris Real Stuff

ufsrestore tvf /dev/rmt/0hn > /usr.txtecho "rewinding"mt -f /dev/rmt/0 rewind

Later on I can get fancy using "mt" with "afs" to position the tape.

Thanks to:

Darren Dunham, Alexandre Wanderley, Rich Kulawiec, Peter Stokes, RicAnderson, Kevin Gallagher, John Benjamins, Ray Brownrigg (I think I goteverybody)

posted by Brahma at 10:02 AM 0 comments

Script to get predefined kernel tunables

#!/bin/ksh

## Script to get predefined kernel tunables## $Log: get_kernel_values,v $# Revision 1.2 2003/09/03 21:40:29 MS35068# *** empty log message ***## Revision 1.1 2003/09/03 21:39:42 MS35068# Initial revision## Written by Matthew Baker

HEX=0P_32VALUE=DP_64VALUE=E

if (( $# ))thenif [[ "$1" == "-x" ]]thenHEX=1

elif [[ "$1" == "-V" ]]thenclearcat <<- EOF | more

NOTE: ***NOT*** all kernel values can be changed dynamically!!!

Page 342: Solaris Real Stuff

Watch: 32bit or 64bit kernel:/usr/bin/isainfo -kv # see if 32 or 64 bit

if 32, 64 bitD EX JW Z

D - long decimal, E - 8 bytes unsignedX - 4 bytes hex, J - 8 byes hexW - 4-byte write, Z - 8-byte write to address

To change,

use: adb -kw /dev/ksyms /dev/memthen i.e.: ncsize/W 0t8000then i.e.: max_page_get/Z 0x1ad3then i.e.: slowscan/Z 0x190 #set slowcan to 400

or

echo 'ncsize/W 0t8000' | adb -kw

To list all possible kernel values:/usr/ccs/bin/nm /platform/`uname -m`/kernel/*unix | moreEOFexit 0

elif [[ "$1" == "-v" ]]thencat <<- EOF | more

Last Updated: 12/21/2001ND - NOT Dynamic (needs reboot - modify in /etc/system)D - Dynamic (can change on active/running system, still need/etc/system to survive a reboot)

NOTE: memory changes (as in slowcan, fastscan, ... Will be lost ifa memory DR occurs and they are not in the /etc/system file)

autoup - ND: max age of any memory-resident pages that have beenmodified [def: 30, range: 4-240]

bufhwm - ND: max amount of memory for caching I/O buffers [def: 2%of phys mem, range: 80KB to 20% of phys mem]

Page 343: Solaris Real Stuff

cachefree - obsolete

coredefault - x

consistent_coloring - D: method to use for external CPU cache(0-uses virtual address bits, 1-physical address is set to virtualaddress, 2-bin hopping, 6-SBSD page coloring scheme)

desfree - D: amount of memory desired to be free at all time onthe system [def: lotsfree/2, range: greater of 1/64 of physmem or 512KB, range: default - 15% of physical memory]

dnlc_dir_enable - D: enables large directory caching [def: 1, range: 0-1]

dnlc_dir_min_size - D: min number of entries before caching onedirectory [def: 40, range: 0-MAXUNIT(no maximum)]

dnlc_dir_max_size - D: max number of entries cached for onedirectory [def: MAXUNIT (no maximum), range: 0-MAXUNIT]

doiflush - D: flush memory or not [def: 1, range:0-1]

dopageflush - D: flush memory or not [def: 1, range:0-1]

fastscan - D: max number of pages per secon the system looks atwhen scanning and pressure is high [def: lesser of 64MB and 1/2 ofphys mem, range: 1 - 1/2 of phys mem]

handspreadpages - D: two-hand clock [def: fastscan, range: 1-numberof phys memory pages]

hires_tick - ND: kernel clock tick resolution [def: 0, range: 0-1]

ipc_tcp_conn_hash_size - ND: controls the hash table size in the IPmodule for all active (ESTABLISHED) TCP connections [def: 512 range:512-1073741824]

ip_icmp_err_interval & ip_icmp_err_burst - D: control rate of IP ingenerating ICMP error messages [def: 100 ms for interval, 10 for burstrange: 0-99999 ms for interval, 1-99999 for burst]

ip_forwarding & ip6_forwarding - D: control whether IP doesforwarding between interfaces: [def: 0 (off), range: 0-1]

ip_forwarding & ip6_forwarding - D: control whether IP doesforwarding for a particular interface: [def: 0 (off), range: 0-1]

Page 344: Solaris Real Stuff

ip_respond_to_echo_broadcast & ip6_respond_to_echo_broadcast - D:control whether IP responds to broadcast ICMPv4 echo request ormulticast ICMPv6 echo request: [def: 1 (on), range: 0-1]

ip_send_redirects & ip6_send_redirects - D: control whether IPsends out a ICMP redirect message [def: 1 (on), range: 0-1]

ip_forward_src_routed & ip6_forward_src_routed - D: Control whetherIPv4 or IPv6 forwards packets with source IPv4 routing options or IPv6routing headers [def: 1 (on), range: 0-1]

ip_addrs_per_if - D: max num of logical interfaces assoc with realinterface [def:256, range: 1-8192]

ip_strict_dst_multihoming & ip6_strict_dst_multihoming - D:Determine whether a packet arriving on a non-forwarding interface canbe accepted for an IP address that is not explicitly configured onthat interface. If ip_forwarding is enabled, or xxx:ip_forwarding forthe appropriate interfaces is enabled, then this parameter is ignored,because the packet is actually forwarded. [def: 0 (loose multihoming),range: 0-1(strict multihoming)]

kmem_flags - D: debugging kernel flags [def: 0 (off), range: 0,1,2,4,8,256]

kobj_map_space_len - D: Amount of kernel memory allocated to storesymbol table info. [def: 1Mb]

lotsfree - D: Initial trigger for system pageing to beign (pagescanner)[def: greater of 1/64 of physmem or 512 KB, range: default -max_number_of_phys_mem_pages ]

lwp_default_stksize - D, size of the kernel stack for lwps (size inbytes and should be multiple of PAGESIZE) [0-262144]

max_nprocs - ND, max number of processes that be can created (thisvalue is used in computing others) [def: 10 + 16 * maxusers, range:266-maxpid]

max_page_get - D: limits the max num of pages that can be allocedin a system. [def: half the number of pages in system, range: ???]

maxpgio - ND: max num of page I/O requests than can be queued bypaging system (divided by 4 to get the actual max used) [def: 40,range: 1-1024]

Page 345: Solaris Real Stuff

maxphys - D (but may not effect all loaded structures): [def:131072(sun4u), range: pagesize - MAXINT]

maxpid | pidmax - ND: largets possible PID (solaris 8 and greater)[def: 30000, range: 266-999,999]

maxuprc - ND: per user process limit [def: reserved_procs, range:1-max_nprocs]

maxusers - ND: system wide calculation value taken from RAM size(def: lesser of the amount of memory in MB and 2048, range in/etc/system: 8-4096)

minfree - D: minimum acceptable memory level. [def: desfree/2,range: greater of 1/256 of physmem or 128 KB, range: default - maxphysical memory pages (7.5% of memory)]

min_percent_cpu - D: min percent of CPU that pageout can consume[def: 4, range: 1-80]

moddebug - D: debug module loading process [def: 0 (off), common: 2,4,8]

ngroups_max - max number of supplementary groups per user

ncsize - ND: directory name lookup cache - DNLC [def: 4 * (v.v_proc+ maxusers) + 320, range: 0 - MAXINT]

ndquot - ND: number of quota structure for UFS that should beallocated [def: maxusers * 40 / 4 + max_nprocs, range: 0 - MAXINT]

noexec_user_stack - D: mark stack as not executable - for security(only needed for 32 bit only, 64 bit apps have it by default) [def: 0(off), range: 0-1]

nrnode - ND: max num of rnodes allocated (NFS inode) [def: ncsize,range: ???]

nstrpush - D: number of modules that can be inserted (pushed) intoa tream [def: 9, range 9-16]

npty - ND: total number of 4.0|4.1 pseudo-ttys configed [def: 48, range: ???]

pageout_reserve - D: Number of pages reserved for the exclusive useof the pageout or scheduler threads. [def: throttlefree/2, range: Theminimum value is 64 Kbytes or 1/512th of physical memory, whichever isgreater, expressed as pages. (no more than 2% of phys mem) ]

Page 346: Solaris Real Stuff

pages_pp_maximum - D: number of pages the system require to beunlocked [def: max of triplet (200, tune_t_minarmem + 100, (10% ofmemory avail at boot)), max: 20% of phys mem]

pages_before_pager - ND: part of sys threshold that immediatelyfrees pages after an I/O completes instead of storing pages forpossible reuse [def: 200, range:1-phys mem]

physmem - D: size of physical memory [1-size_of_memory]

priority_paging - ND: PRE 2.8, enabling the system to place aboundary around the file cache (ensure that FS I/O does not causeapplication paging)

pt_cnt - ND: total number of 5.7 pseudo-ttys configsed [def: 0,range: 0 - MAXPID]

pt_pctofmem: ND: max percent of phys mem that can be used by/dev/pts entries. 64 bit kernel uses 176 bytes per /dev/pts, 32 bit112 bytes [def: 5, range: 0-100]

pt_max_pty: D: max number of pty the system offers [def: 0 (usessystem defined max (see pt_cnt), range: 0 - MAXUNIT]

rechoose_interval - D: number of clock ticks before a process isdeemed to have lost all affinity for the last CPU it ran on. Afterthis interval expires, any CPU is considered a candidate forscheduling a thread. Valid only for threads. [def: 3, range: 0 -MAXINT]

reserved_procs - ND: number of system process slots to be reservedin PID table for UID 0 process [def: 5, range: 5 - MAXINT]

rlim_fd_cur - ND: current file descriptor (files open) limit (softulimit) def: 256, range: 1 - MAXINT)

rlim_fd_max - ND: maximum file descriptor (files open) limit (hardulimit) [def: 1024, range: 1 - MAXINT]

rstchown - D: allow chown by users other than root [def:1 (on -can't chown unless root), range: 0-1]

sadcnt - number of sad devices

nautopush - ND: number of sad autopush structures [def:32]

Page 347: Solaris Real Stuff

segkpsize - ND: amount of kernel pageable memory availableprimarily for kernel threads [def: 32_bit_kernel#512MB, 64_bit#2GB,range: 32_bit#512MB, 64_bit#512MB-2GB]

slowscan - D: min number of pages per secon that the system looksat when reclaiming memory [def: lesser of 1/20 of phys mem or 100,range: 1 to fastscan/2 ]

strmsgsz - D: max number of bytes a single system call can pass toa STREAM to be placed in the data part of a message. If a writeexceeds this size, it is broken into multiple messages [def: 65,536,range 0-262144]

strctlsz - D: max number of bytes a single system call can pass toa STREAM to be placed in the control part of a message. [def: 1024,range 0-MAXINT]

swapfs_reserve: N: when allocating for actual need on backingstore, it keeps the system from deadlock if there is excessiveconsumption [def: lesser of 4MB or 1/16 phys mem, range: default - maxnumber of phys mem pages]

swapfs_minfree: N: amount of phys mem that is desired to be keptfree for the rest of the system [def: larger of 2MB and 1/8 of physmem, range: 1 - phys mem]

throttlefree - D: memory level at which blocking memory allocationrequests are put to sleep (def: minfree, range: greater of 1/256 ofphysmem or 128 KB, range: default - max physical memory pages (4% ofphysical memory) ]

timer_max - ND: number of POSIX timers available [def: 32, range: 0 - MAXINT]

tune_t_flckrec - max number of active frlocks

tune_t_fsflushr - ND: fsflush run interval [1 - MAXINT]

tune_t_gpgslo - ND: page stealing low water mark, see/usr/include/sys/tuneable.h

tune_t_minarmem - ND: min available resident (not swappable) memoryneeded to avoid deadlock (in pages) [def: 25, range: 1-phys_mem)

tune_t_minasmem - min available swappable memory needed to avoiddeadlock (in pages)

Page 348: Solaris Real Stuff

tcp_conn_hash_size - ND: controls the hash table size in the tcpmodule for all tcp connections (def: 512 range: 512-1073741824)

tcp_deferred_ack_interval - D: time-out value for tcp delayed ACKstime in milliseconds [def: 100, range: 1ms-1minute]

tcp_deferred_acks_max - D: The maximum number of TCP segments (inunits of maximum segment size MSS for individual connections) receivedbefore an acknowledgment (ACK) is generated. If set to 0 or 1, itmeans no delayed ACKs, assuming all segments are 1 MSS long. Notethat for remote destinations (not directly connected), the maximumnumber is fixed to 2, no matter what this parameter is set to. Theactual number is dynamically calculated for each connection. The valueis the default maximum. [def: 8, range: 0-16]

tcp_wscale_always - D: If set to 1, TCP always sends SYN segmentwith the window scale option, even if the option value is 0. Note thatif TCP receives a SYN segment with the window scale option, even ifthe parameter is set to 0, TCP responds with a SYN segment with thewindow scale option, and the option value is set according to thereceive window size. [def: 0 (off), range: 0-1]]

tcp_tstamp_always - D: If set to 1, TCP always sends SYN segmentwith the timestamp option. Note that if TCP receives a SYN segmentwith the timestamp option, TCP responds with a SYN segment with thetimestamp option even if the parameter is set to 0. [def: 0 (off),range: 0-1]]

tcp_xmit_hiwat - D: The default send window size in bytes. Refer tothe following discussion of per-route metrics for setting a differentvalue on a per route basis. [def: 16384, range: 4096-1073741824]

tcp_recv_hiwat - D: The default receive window size in bytes. Referto the following discussion of per-route metrics for setting adifferent value on a per-route basis. [def: 24576, range:2048-1073741824]

tcp_max_buf - D: The maximum buffer size in bytes. It controls howlarge the send and receive buffers are set to by an application using[def: 1048576, range: 8192 - 1073741824]

tcp_cwnd_max - D: max value TCP congestion window (cwnd) in bytes[def: 1048576, range: 128-1073741824]

tcp_slow_start_initial - D: DO NOT CHANGE! The maximum initialcongestion window (cwnd) size in MSS of a TCP connection.

Page 349: Solaris Real Stuff

tcp_slow_start_after_idle - D: The congestion window size in MSS ofa TCP connection after it has been idled (no segment received) for aperiod of one retransmission timeout (RTO). [def: 4, range: 1-16384]

tcp_sack_permitted - D: If set to 2, TCP always sends SYN segmentwith the selective acknowledgment (SACK) permitted option. If TCPreceives a SYN segment with a SACK-permitted option and this parameteris set to 1, TCP responds with a SACK-permitted option. If theparameter is set to 0, TCP does not send a SACK-permitted option,regardless of whether the incoming segment contains the SACK permittedoption or not. [def: 2, range: 0 (disabled), 1 (passive enabled), 2(active enabled)]

tcp_rev_src_routes - D: If set to 0, TCP does not reverse the IPsource routing option for incoming connections for security reasons.If set to 1, TCP does the normal reverse source routing. [def: 0(off), range: 0-1]

tcp_time_wait_interval - D: The time in milliseconds a TCPconnection stays in TIME-WAIT state. Replaced tcp_close_wait_interval.See by "ndd -get /dev/tcp | grep tcp_time_wait_interval". [def: 4minutes, range: 1 second - 10 minutes]

tcp_conn_req_max_q - D: The default maximum number of pending TCPconnections for a TCP listener waiting to be accepted byaccept(3SOCKET) [def: 128, range 1-4294967296]

tcp_conn_req_max_q0 - D: The default maximum number of incomplete(three-way handshake not yet finished) pending TCP connections for aTCP listener. [def: 1024, range 1-4294967296]

tcp_conn_req_min - D: The default minimum value of the maximumnumber of pending TCP connection requests for a listener waiting to beaccepted. This is the lowest maximum value of listen(3SOCKET) anapplication can use. [def: 1, range: 1-1024]

tmpfs_maxkmem - D: max amount of kernel memory that TMPFS can usefor its datastructures (tmpnodes and directory entries) [range: numberof bytes in one page to 25% of available kernel memory]

tmpfs_minfree - D: min amount of swap space the TMPFS leaves forreset of system [def: 256Bytes, range: 0 - max swap space size]

udp_xmit_hiwat - D: def max UDP socket datagram size in bytes [def:8192, range: 4096, 65536]

Page 350: Solaris Real Stuff

udp_recv_hiwat - D: def max UDP socket receive size in bytes [def:8192, range: 4096, 65536]

ufs_ninode - D: number of inodes to be help in memory [def: ncsize,range: 0 - MAXINT]

ufs_LW - D: unflushed UFS data Low Water mark [def: 256 * 1024,range: 0 - MAXINT]

ufs_HW - D: unflushed UFS data High Water mark [def: 384 * 1024,range: 0 - MAXINT]

ufs_WRITES - D: If ufs_WRITES is non-zero, the number of bytesoutstanding for writes on a file is checked. See ufs_HW subsequentlyto determine whether the write should be issued or should be deferreduntil only ufs_LW bytes are outstanding. The total number of bytesoutstanding is tracked on a per-file basis so that if the limit ispassed for one file, it won't affect writes to other files. [def: 1(on), range: 0-1]

seminfo_semaem - ND: max value that a semaphore value in an undostructure can be set to [def: 16384, range: 1-65535]

seminfo_semmsl - ND: max number of SV semaphores per semaphoreidentifier [def: 25, range: 1 - MAXINT]

seminfo_semmap - ND: number of entries in the semaphore map [def: 10]

seminfo_semmni - ND: max number of semaphore identifiers [def: 10,range: 1-65535]

seminfo_semmns - ND: max number of SV semaphores on system [def:60, range: 1 - MAXINT]

seminfo_semmnu - ND: total number of undo structures supported bySV semaphore system [def: 30, range: 1 - MAXINT]

seminfo_semopm - ND: max number of SV semaphone operations persemop call. (number of sembufs in the sops array that is provided tothe semop sys call) [def: 10, range: 1 - MAXINT]

seminfo_semume - ND: max number of SV semaphore undo structuresthat can be undo by any one process [def: 10, range: 1 - MAXINT]

seminfo_semvmx - ND: max value a semaphone can be set to [def:32767, range: 1-65535]

Page 351: Solaris Real Stuff

shminfo_shmmin - ND: DO NOT CHANGE! min size of SV shared memorysegment that can be created [def: 1, range: 0-phys mem]

shminfo_shmmni - ND: sys wide limit on num of shared memorysegments that can be created [def: 100, range: 0 - MAXINT]

shminfo_shmmax - ND: max size of SV shared memory segment that canbe created [def: 1048576, range: 32bit# 0 - MAXINT, 64bit# 0 -MAXINT64]

shminfo_shmseg - ND: limit on num of shared memory segments thatany one process can create [def: 6, range: 0-32767]

segspt_minfree - ND: pages of sys memory that cannot be allocatedfor ISM shared memory [def: 5% of avail sys memory when first ISMsegment is created, range: 0-32767]

scsi_options - see /usr/include/sys/scsi/targets/ssddef.h

sd_io_time - see /usr/include/sys/scsi/targets/ssddef.h

sd_max_throttle - see /usr/include/sys/scsi/targets/ssddef.h

EOFexit 0

elseprint "Usage: $0 [-v|-V|-x|-h]"print " No args - print out kernel values"print " where -v describes what the kernel values mean"print " where -V lists how to change them"print " where -x prints out values in hex"print " where -h prints out usage"exit 1fifi

## Main - no command line args#

# find out kernel bit typeBITS=$(isainfo -kv | cut -d' ' -f1 )

# if kernel bit type cannot be determined or is 32if [[ -z $BITS || $BITS == "32-bit" ]]

Page 352: Solaris Real Stuff

then#32 bit#set 64 bit variables to 32 bit# not able to list any 64 as only 32 bit kernel# so list everything in 32 bit termsif (( HEX ))thenP_32VALUE=XP_64VALUE=XelseP_32VALUE=DP_64VALUE=Dfielse#64 bit#use real 64 bit prints#set 64 bit variables to appropriate type#set 32 bit variables to appropriate type#you still list some 32 bit variable in 64 bit kernel# as not everything is 64 bit variablesif (( HEX ))thenP_32VALUE=XP_64VALUE=JelseP_32VALUE=DP_64VALUE=Efifi

## main area where we display the actual kernel values#cat << EOF | adb -k /dev/ksyms /dev/mem | awk '{ if (NF ==2) printf ("%30s %-s \n", $1, $2)}' | grep : | moreautoup/$P_32VALUEbufhwm/$P_64VALUEcoredefault/$P_32VALUEconsistent_coloring/$P_32VALUEdesfree/$P_64VALUEdnlc_dir_enable/$P_32VALUEdnlc_dir_min_size/$P_32VALUEdnlc_dir_max_size/$P_32VALUEdoiflush/$P_32VALUEdopageflush/$P_32VALUEfastscan/$P_64VALUE

Page 353: Solaris Real Stuff

handspreadpages/$P_64VALUEhires_tick/$P_64VALUEkmem_flags/$P_32VALUEkobj_map_space_len/$P_32VALUElotsfree/$P_64VALUElwp_default_stksize/$P_32VALUEmax_nprocs/$P_32VALUEmax_page_get/$P_64VALUEmaxpid/$P_32VALUEmaxpgio/$P_64VALUEmaxuprc/$P_32VALUEmaxusers/$P_32VALUEminfree/$P_64VALUEmin_percent_cpu/$P_32VALUEmaxphys/$P_64VALUEphysmax/$P_64VALUEmoddebug/$P_32VALUEngroups_max/$P_32VALUEncsize/$P_32VALUEndquot/$P_32VALUEnautopush/$P_32VALUEnoexec_user_stack/$P_64VALUEnrnode/$P_32VALUEnstrpush/$P_32VALUEnpty/$P_32VALUEpageout_reserve/$P_64VALUEpages_pp_maximum/$P_64VALUEpages_before_pager/$P_64VALUEphysmem/$P_64VALUEpriority_paging/$P_64VALUEpt_cnt/$P_32VALUEpt_pctofmem/$P_32VALUEpt_max_pty/$P_32VALUErechoose_interval/$P_32VALUEreserved_procs/$P_32VALUErlim_fd_cur/$P_32VALUErlim_fd_max/$P_32VALUErstchown/$P_32VALUEsadcnt/$P_32VALUEsegkpsize/$P_64VALUEslowscan/$P_64VALUEstrmsgsz/$P_64VALUEstrctlsz/$P_64VALUEswapfs_reserve/$P_64VALUEswapfs_minfree/$P_64VALUEthrottlefree/$P_64VALUE

Page 354: Solaris Real Stuff

tmpfs_maxkmem/$P_64VALUEtmpfs_minfree/$P_64VALUEtimer_max/$P_32VALUEtune_t_flckrec/$P_32VALUEtune_t_fsflushr/$P_32VALUEtune_t_gpgslo/$P_32VALUEtune_t_minarmem/$P_32VALUEtune_t_minasmem/$P_32VALUEipc_tcp_conn_hash_size/$P_32VALUEip_icmp_err_interval/$P_32VALUEip_icmp_err_burst/$P_32VALUEip_forwarding/$P_32VALUEip6_forwarding/$P_32VALUEip_respond_to_echo_broadcast/$P_32VALUEip6_respond_to_echo_broadcast/$P_32VALUEip_send_redirects/$P_32VALUEip6_send_redirects/$P_32VALUEip_forward_src_routed/$P_32VALUEip6_forward_src_routed/$P_32VALUEip_addrs_per_if/$P_32VALUEip_strict_dst_multihoming/$P_32VALUEip6_strict_dst_multihoming/$P_32VALUEudp_xmit_hiwat/$P_32VALUEudp_recv_hiwat/$P_32VALUEufs_ninode/$P_32VALUEufs_LW/$P_32VALUEufs_HW/$P_32VALUEufs_WRITES/$P_32VALUEtcp_conn_hash_size/$P_32VALUEtcp_deferred_ack_interval/$P_32VALUEtcp_deferred_ack_max/$P_32VALUEtcp_wscale_always/$P_32VALUEtcp_tstamp_always/$P_32VALUEtcp_conn_req_max_q/$P_32VALUEtcp_conn_req_max_q0/$P_32VALUEtcp_conn_req_min/$P_32VALUEtcp_time_wait_interval/$P_32VALUEtcp_rev_src_routes/$P_32VALUEtcp_sack_permitted/$P_32VALUEtcp_slow_start_after_idle/$P_32VALUEtcp_slow_start_initial/$P_32VALUEtcp_cwnd_max/$P_32VALUEtcp_max_buf/$P_32VALUEtcp_recv_hiwat/$P_32VALUEtcp_xmit_hiwat/$P_32VALUEseminfo_semaem/$P_32VALUE

Page 355: Solaris Real Stuff

seminfo_semmsl/$P_32VALUEseminfo_semmap/$P_32VALUEseminfo_semmni/$P_32VALUEseminfo_semmns/$P_32VALUEseminfo_semmnu/$P_32VALUEseminfo_semopm/$P_32VALUEseminfo_semume/$P_32VALUEseminfo_semvmx/$P_32VALUEshminfo_shmmin/$P_32VALUEshminfo_shmmni/$P_32VALUEshminfo_shmmax/$P_32VALUEshminfo_shmseg/$P_32VALUEsegspt_minfree/$P_32VALUEscsi_options/$P_32VALUEsd_io_time/$P_32VALUEsd_max_throttle/$P_32VALUEEOF

################################################################################# This script is submitted to BigAdmin by a user of the BigAdmin community.### Sun Microsystems, Inc. is not responsible for the### contents or the code enclosed. ######### Copyright 2005 Sun Microsystems, Inc. ALL RIGHTS RESERVED### Use of this software is authorized pursuant to the### terms of the license found at### http://www.sun.com/bigadmin/common/berkeley_license.html##############################################################################

posted by Brahma at 10:00 AM 0 comments

Reading a file and passing it as a value to a script.

Reading a file and passing it as a value to a script.All 2 messages in topic - view as tree

Hi All,

I need some information on how to do the following.

I have a file (images.txt)

the contents are as follws...

Page 356: Solaris Real Stuff

image1.jpgimage23.jpgimage47.jpgimage100.jpg

.... and so on.

What I hope to do is have a script read this file, and for each entry,perform a operation using imagemagick.

How could I achive this, I am using bash 3 on FreeBSD if this makes adifference.

-- Materialisedperl -e 'printf "%silto%c%sal%c%s%ccodegurus%corg%c", ma, 58, mw, 107,'er', 64, 46, 10;'

Bart: "Christmas is a time when people of all religions come together toworship Jesus Christ."

Reply

Subject: Re: [bash] Reading a file and passing it as a value to a script.

while read line; do...done < images.txt

-- William Park

posted by Brahma at 9:59 AM 0 comments

FTP issues

Re: FTP IssuesPosted By Don Rowland On Wednesday, June 22, 2005 at 12:16 AM

Rodrigo,

Here is a routine I use to test ftp, when users report ftp is not working.It ftp's a file from machine-a to machine-b andthen ftp's the file back and compares the two files. If anything goeswrong, the log files contain all of the ftp status

Page 357: Solaris Real Stuff

codes. This routine is a little overkill. If the .netrc file is correct onthe remote system, all you need are the ftp commands.

### ftp# -d = enable debugging# -v = show ALL responses from remote system#

ftp -vd <<ftp_cmds1open macbprompt offput test_filequitftp_cmds1

The routine runs on machine-a (maca). test_file is sent to and retrievedfrom machine-b (macb).

#! /bin/ksh### ftp2macb - ftp a file from this system to machb and capture the ftp replycodes# This routine tests ftp to verify the correctness of thefile transfers.## Set the following arguments to the ftp command to allow debugging# -d = enable debugging# -v = show ALL responses from remote system

## Don Rowland - Hewlett Packard Corporation - 01/28/2004

### Clean out any previous log files#if [ -f ftp_log1.* ]thenrm ftp_log1.*fiif [ -f ftp_log2.* ]thenrm ftp_log2.*fiif [ -f test_file.back ]then

Page 358: Solaris Real Stuff

rm test_file.backfi

### Whoami? I only want this routine to run as myself, especially not root.#ME=`whoami`if [ $ME != "<my account here>" ]then### If you are NOT running as <my account>, exit this script#echo "You must be user <my account> to run ftp2macb!"echo "You are `whoami`; sorry!"; exit 1fi

### Does .netrc file exist on this system?#WHO=`ls -al /\.netrc | sed 's/ */ /g' | cut -d" " -f3`### If the user and password in .netrc do not match the# logged in user, exit this script#if [ $ME != $WHO ]thenecho ".netrc does not match logged in user! Sorry!"; exit 1fi

### Send test file to macb#echo "\n\tSEND FILE"START=`date +"%b %d %T"`tail -1 /var/adm/messages > ftp_log1.$$### ftp# -d = enable debugging# -v = show ALL responses from remote system# record debug responses in file ftp_log1.$$#ftp -vd <<ftp_cmds1 >> ftp_log1.$$ 2>&1open macb <-- This is the host name of the remote system.prompt offput test_filequit

Page 359: Solaris Real Stuff

ftp_cmds1tail -1 /var/adm/messages >> ftp_log1.$$STOP=`date +"%b %d %T"`echo "Start = $START, Stop = $STOP" >> ftp_log1.$$

### If we got this far without hanging, purge the log file## rm ftp_log1.$$

### Did it work?### retrieve test file from macb#echo "\n\RECEIVE FILE"START=`date +"%b %d %T"`tail -1 /var/adm/messages > ftp_log2.$$### record debug responses in file ftp_log2.$$#ftp -vd <<ftp_cmds2 >> ftp_log2.$$ 2>&1open macb <-- This is the host name of the remotesystemprompt offget test_file test_file.backquitftp_cmds2tail -1 /var/adm/messages >> ftp_log2.$$STOP=`date +"%b %d %T"`echo "Start = $START, Stop = $STOP" >> ftp_log2.$$

### Does the file I sent out match the file I got back?#diff test_file test_file.backsum test_file test_file.back >> ftp_log2.$$sum test_file test_file.back

### If we got this far without hanging, purge the log file## rm ftp_log2.$$

Good Luck,

Page 360: Solaris Real Stuff

Don

posted by Brahma at 9:55 AM 0 comments

root password has lost root password

Re: root login problemsPosted By mkatiyar On Tuesday, June 21, 2005 at 11:00 PM

hi Dan,Sending you the article.......*****************************************************************************************Regaining control of a Solaris x86 system where the root password hasbeen lost can be accomplished by the following steps. Note that anysavvy user can do this with the proper CD-ROM and diskette. Therefore,of course, physical security of a system is important for machinescontaining sensitive data.Insert installation boot diskette and installation CD-ROM for Solaris x86.Boot system from the installation floppy and select the CD-ROM as theboot device.Type "b -s" (instead of typing 1 or 2 from the menu) and it'll dropyou straight to a root shell, #, (and you'll be in single-user mode).At the root prompt, #, key in the following commands, which willcreate a directory called hdrive under the /tmp directory and thenmount the root hard drive partition under this temporary directory.

mkdir /tmp/hdrivemount /dev/dsk/c0t0d0s0 /tmp/hdrive #SCSI; for ATAPI, omit "t0"To use the vi editor, the TERM variable must be defined. Key in thefollowing commands.

TERM=at386export TERMStart vi (or some other editor) and load /tmp/hdrive/etc/shadow file:

vi /tmp/hdrive/etc/shadowChange the first line of the shadow file that has the root entry to:

root::6445::::::Write and quit the vi editor with the "!" override command:

:wq!Halt the system, remove the floppy installation diskette, and rebootthe system:

Page 361: Solaris Real Stuff

haltWhen system has rebooted from the hard drive, you can now log in fromthe Console Login: as root with no password. Just hit enter for thepassword.After logging in as root, use the passwd command to change the rootpassword and secure the system.

[Thanks to Lynn R. Francis, Texas State Technical College]*****************************************************************************************

posted by Brahma at 9:52 AM 0 comments

Cloning on Solaris

RE: Cloning on SolarisPosted By Mohammed Sadiq On Tuesday, June 21, 2005 at 6:23 AM

You can create a snapshot of complete operating System through fssnapcommand.See details here

http://docs.sun.com/app/docs/doc/817-6960/6mmah94ei?a=view

Regards.

-----Original Message-----From: hemant_k via solaris-l [mailto:<email@removed>]Sent: Tuesday, June 21, 2005 11:54 AMTo: Mohammed SadiqSubject: [solaris-l] Cloning on Solaris

hello,

i would like to know if there is any disk cloning utility for solaris onsparc platform ...for eg symantec ghost which we use in windowsenvoirnment...

regards,

Check out Flar Archive format, this format is similar to ignite or mksysb

and the man page of "flarcreate".

Hemant

Page 362: Solaris Real Stuff

posted by Brahma at 9:51 AM 0 comments

new virtual host in your httpd.conf

Create a new virtual host in your httpd.conf file.

i.e vi /usr/local/apache/conf/httpd.conf

<VirtualHost newweb1>DocumentRoot /apachedocs/newwebServername www.newweb.comErrorLog logs/www.newweb.com-error_log</VirtualHost>

Hi guys,

Any one is used Apache web server in sunsolaris 2.8. I neeed to know ,how to create a new web site in apache web server .Pls tell me as soon aspossible.

posted by Brahma at 9:50 AM 0 comments

Query for an existing bootblk

Subject: SUMMARY: Query for an existing bootblk

Hi *,

Many thanks to Hutin Bertrand, Kalyan Manchikanti, Brad Morrison, JoeFletcher & Siva Santhakumar.

My question was:

"I have to query several sun5.8 sparc boxes to see if the secondarymirror has a bootblk installed.

If not I will need to call the following:

installboot /usr/platform/`uname -i`/lib/fs/ufs/bootblk/dev/rdsk/c1t1d0s0

Someone suggested:

dd if=/dev/rdsk/c1t1d0s0 of=mirror.vtoc bs=512 count=1

Which returns

Page 363: Solaris Real Stuff

1+0 records in1+0 records out"

************* SUNMANAGERS -

Hutin suggested:

I need to use std.vtoc to have something to compare to:

dd if=/dev/rdsk/c1t0d0s0 of=std.vtoc bs=512 count=1dd if=/dev/rdsk/c1t1d0s0 of=mirror.vtoc bs=512 count=1

"cmp std.vtoc mirror.vtoc " and compare the outcome's, any differencewould suggest a different bootblk."

- I carried out this and sure enough found that a machine without theplatform specific bootblk returned a different result.

Siva explained:

"dd command taking the input from /dev/rdsk/c1t1d0s0 and write (single)of 512 block size. In your case if your current boot disk is c0t0d0s0and alternate boot disk is c1t1d0s0 then type:

dd if=/dev/rdsk/c0t0d0s0 of=/dev/rdsk/c1t1d0s0 bs=512 count=1 make suredisks names are correct.

Brad suggested and explained:

"When dd says:1+0 records in1+0 records outit means that it has successfully read and written 1 record of the sizeyou specified (__, in this case). That is, "it worked". :-)

Anything else could be an error, unless your record size doesn'tcorrespond to what dd found. Try creating a file with 1025 bytes andrun:

dd if=1025-byte-file of=/dev/null bs=1024

dd should answer with1+1 records in1+1 records out

to indicate one complete 1024-byte record, plus one partial record.

Page 364: Solaris Real Stuff

There's one more step after dd'ing the area where the bootblock shouldbe: compare it to the bootblock you'd install if you found it missing.After dd'ing what's on a given system to some temp file, use cmp tocompare it to /usr/platform/`uname -i`/lib/fs/ufs/bootblk "

- This proved to be the case, and helped me to verify further.

Joe Suggested:

"Any reason why you don't just run the boot block installs on all thedisks and forget the rest? If it's already there it just getsoverwritten, if it's not there then the install is done."

***********

So it seems the dd command was not crucial to commiting an installboot(not cause any damage), as I could simply overwrite any existing bootblkwith the correct architecture - But I learnt a lot more in the process,which is always a good thing :)

Thanks All!!!

Cheers,Luke

posted by Brahma at 9:47 AM 0 comments

Script for changing the password

Subject: FW: SUMMARY: Script for changing the password

Hi Managers,

Here is the more compact version updated script . Thanks to "AndrewHall" for advising me to use case statement.This avoids using NFS mount.

#!/usr/bin/ksh -x#################################################################################Script Written on 27th June 05 .#This script is used to change the root password of all unix hosts#Make sure /usr/sysadm/scripts/minoti/all-hosts is updated with livehosts before running this script###############################################################################

Page 365: Solaris Real Stuff

for i in `cat /usr/sysadm/scripts/minoti/test-hosts`doOS=`remsh $i uname`case $OS inSunOS) rsh $i "rm /tmp/shad*"rsh $i "cp -p /etc/shadow /etc/shadow.2706"rsh $i "cat /etc/shadow|grep -v root>/tmp/shad1"rsh $i "echo "root:1EDHxu0aw6jRE:12958::::::">/tmp/shad2"rsh $i "cat /tmp/shad1>>/tmp/shad2"rsh $i "cp /tmp/shad2 /etc/shadow"rsh $i "/usr/sbin/pwconv"rsh $i "chown root:sys /etc/shadow"rsh $i "chmod 400 /etc/shadow";;HP-UX) rsh $i "rm /tmp/shad*"rsh $i "rm /tmp/pass*"rsh $i "cp -p /etc/passwd /etc/passwd.2706"rsh $i "cat /etc/passwd|grep -v root>/tmp/shad1"rsh $i "echo "root:WkmiDJgfPbUB.:0:3::/:/sbin/sh">/tmp/shad2"rsh $i "cat /tmp/shad1>>/tmp/shad2"rsh $i "cp /tmp/shad2 /etc/passwd"rsh $i "chown root:other /etc/passwd"rsh $i "chmod 444 /etc/passwd";;Linux) rsh $i "rm /tmp/shad*"rsh $i "cp -p /etc/shadow /etc/shadow.2706"rsh $i "cat /etc/shadow|grep -v root>/tmp/shad1"rsh $i "echo'root:"$"1"$"hluzjp3u"$"bwx/ZLLAM4qANpMXTvBLz1:12961:0:99999:7:::'>/tmp/shad2"rsh $i "cat /tmp/shad1>>/tmp/shad2"rsh $i "cp /tmp/shad2 /etc/shadow"rsh $i "/usr/sbin/pwconv"rsh $i "chown root:root /etc/shadow"rsh $i "chmod 400 /etc/shadow";;IRIX*) rsh $i "rm /tmp/shad*"rsh $i "/sbin/cp -p /etc/shadow /etc/shadow.2706"rsh $i "/sbin/cat /etc/shadow|grep -v root>/tmp/shad1"rsh $i "/sbin/echo "root:kN6gTIyyu5foo:12958::::::">/tmp/shad2"rsh $i "/sbin/cat /tmp/shad1>>/tmp/shad2"rsh $i "/sbin/cp /tmp/shad2 /etc/shadow"rsh $i "/sbin/pwconv"rsh $i "chown root:sys /etc/shadow"rsh $i "chmod 400 /etc/shadow";;

Page 366: Solaris Real Stuff

*) echo "platform $OS not supported";;esacdone

RegardsMinoti Koul

posted by Brahma at 9:43 AM 0 comments

Managing packages on Solaris

Managing packages on Solaris - A few useful commands:

To install: pkgadd -d /<directoryname>/ <packagename> or changedirectories to the directory where the package is and typepkgadd -d <packagename>

To remove: pkgrm <packagename>

For info on a package: pkginfo -l <packagename>

To see where the package was installed: pkchk -v <packagename>

posted by Brahma at 9:40 AM 0 comments

grep command

rep, egrep, fgrepsearch a file for a pattern

----------------------------------------------------------------

SYNTAXgrep [ -bchilnsvw ] limited-regular-expression [filename ... ]

egrep [-bchilnsvx] ... [file ...]

fgrep [-bchilnsv] ... [file ...]

----------------------------------------------------------------

DESCRIPTIONThe grep family searches text files for a pattern and prints all linesthat contain that pattern. Be careful using the characters $, *, [, ^,|, (, ), and \ in the pattern_list because they are also meaningful to

Page 367: Solaris Real Stuff

the shell. It is safest to enclose the entire pattern_list in singlequotes '... '.

grep - uses a limited set of regular expressions

fgrep - fast grep, searches for a string not a pattern

egrep - expression grep, uses full regular expressions

Options-b Precede each line by the block number on which it was found. Thiscan be useful in locating block numbers by context (first block is 0).

-cPrint only a count of the lines that contain the pattern.

-hPrevents the name of the file containing the matching line from beingappended to that line. Used when searching multiple files.

-iIgnore upper/lower case distinction during comparisons.

-l Print only the names of files with matching lines, separated byNEWLINE characters. Does not repeat the names of files when thepattern is found more than once.

-n Precede each line by its line number in the file (first line is 1).

-s Suppress error messages about nonexistent or unreadable files.

-v Print all lines except those that contain the pattern.

-w Search for the expression as a word as if surrounded by \< and \>.

---------------------------------------------------------------

EXAMPLES

find info on pppd in the process table:ps -ef | grep pppd

find all companies in CA in the file database:

grep CA database

Page 368: Solaris Real Stuff

find the word disk in upper or lower case in all files:

grep -i disk *

posted by Brahma at 9:40 AM 0 comments

Change TCP-transmitbuffers:

Change TCP-transmitbuffers:

ndd -set /dev/tcp tcp_xmit_hiwat <value>

Change TCP-receivebuffers:

ndd -set /dev/tcp tcp_recv_hiwat <value>

posted by Brahma at 9:39 AM 0 comments

Deport/Import diskgroups

Disk Group Tips

Diskgroups can be moved from one system to another system. On theoriginal system, unmount any volumes in the disk group to be deportedand stop them with the command "vxvol stop". Then deport the diskgroup with the command "vxdg deport".

Move the disks physically to the new system and either reboot thesystem so that Veritas recognizes the new disks, or use the command"vxdctl enable" to restart vxconfigd.

Import the disks with the command "vxdg import". Then restart allvolumes in the disk group with the command "vxrecover -g -sb".

To rename diskgroups during deport/import, run the deport/importcommand with the argument "-n ". After importing the diskgroup, runthe command "vxprint -hrt". The volumes that belong to the importeddisk group may be DISABLED. Run the command "vxinfo "volume name" tobe sure that the volumes are "startable". Then run "vxrecover -s" toenable each volume.

If the disks in question have been moved due to the crash of theoriginal system, then they will not have been properly deported andwill be locked. To unlock the disks, run the command "vxdiskclearimport" or "vxdg -C import". Be sure not to run these commands ifthe original system has physical access to the disks since that would

Page 369: Solaris Real Stuff

allow access to the disks from multiple hosts and possibly cause datacorruption.

How To: Deport/Import

Disk groups can be moved from one system to another with thedeport/import commands. The same technique can be used to rename adisk group by deporting the disk group with a new name.

1. Unmount any filesystems on the disk group

2. Stop any volumes on the disk group:

vxvol stop

3. Deport the disk group:

vxdg deport

4. Import the disk group:

vxdg import

5. Restart and resynchronize the disk group:

vxrecover -g -sb

6. Remount any filesystems from the disk group

If you have any problems, check the status of volumes on the disk group:

vxinfo

To rename the disk group during deport/import, use the -n argument:

vxdg deport -n

posted by Brahma at 9:38 AM 0 comments

Veritas -- remove a mirror from a volume

How To: Mirror Removal

To remove a mirror from a volume (i.e., to remove one of the plexesthat belongs to the volume), run the following command:

Page 370: Solaris Real Stuff

vxplex -o rm dis

Any associated subdisks will then become available for other uses. Toremove the disk from Volume Manager control entirely, run thefollowing command:

vxdisk rm

For example, "vxdisk rm c1t1d0s2".

How To: Mirror Backup

The following techniques can be used to backup mirrored volumes bytemporarily taking one of the mirrors offline and then reattaching themirror to the volume once the backup has been run.

1. Disassociate one of the mirrors from the volume to be backed up:

vxplex dis

2. Create a new, temporary volume using the disassociated plex:

vxmake -g -U gen vol tempvol plex=

3. Start the new volume:

vxvol start tempvol

4. Clean the new volume before mounting:

fsck -y /dev/vx/rdsk//tempvol

5. Mount the new volume and perform the backup

6. Unmount the new volume

7. Stop the new volume:

vxvol stop tempvol

8. Disassociate the plex from the new volume:

vxplex dis

9. Reattach the plex to the original volume:

Page 371: Solaris Real Stuff

vxplex att

10. Delete the temporary volume:

vxedit rm tempvol

To display the current Veritas configuration, use the following command:

vxprint

To monitor the progress of tasks, use the following command:

vxtask -l list

To display information related to plexes, run the following command:

vxprint -lp

posted by Brahma at 9:38 AM 0 comments

Installing an Ethernet Card

Installing an Ethernet Card

To install an additional Ethernet interface (let's call it le1):

1) Put the card in and from the ok> prompt do a boot -r

1) create /etc/hostname.le1 that contains the host name

2) make the interface known to the system:

ifconfig le1 plumb

3) configure the interface:

ifconfig le1 up netmask + broadcast +

4) test the connection

Sun Network Interface Duplex

Checking and setting the link parameters of Sun hme network interfacesrequires the use of the ndd command. The procedure for qfe interfacesis similar, except that the instance must be specified. For example,

Page 372: Solaris Real Stuff

the Sun QuadFast Ethernet card would have instances 0-3: qfe0, qfe1,qfe2, qfe3.

Checking the current running speed(s):

Choose the interface instance:

# ndd -set /dev/qfe instance 0

That selects the first instance: qfe0. Note that the default instance is 0.

Check the status, speed & mode:

# ndd -get /dev/qfe link_status1 = up0 = down# ndd -get /dev/qfe link_speed1 = 100 Mb0 = 10 Mb# ndd -get /dev/qfe link_mode1 = Full Duplex (FDX)0 = Half Duplex (HDX)

Hot to configure individual interfaces via ndd commands:

These commands are usually placed in a startup script such as /etc/rc2.d/S99qfe.

Hot to force 100Mbs Full Duplex (FDX) on qfe1:

ndd -set /dev/qfe instance 1ndd -set /dev/qfe adv_100T4_cap 0ndd -set /dev/qfe adv_100fdx_cap 1ndd -set /dev/qfe adv_100hdx_cap 0ndd -set /dev/qfe adv_10fdx_cap 0ndd -set /dev/qfe adv_10hdx_cap 0ndd -set /dev/qfe adv_autoneg_cap 0

It is important to run these commands in the above order. The linkwill be renegotiated when the final command is run

posted by Brahma at 9:36 AM 0 comments

Tape Control - Commands

Tape Control -the mt Command:

Page 373: Solaris Real Stuff

This assume that the device is at the 0 address.

Shows whether device is valid, whether tape is loaded, and status of tape

mt -f /dev/rmt/0 status:

Rewinds tape to start

mt -f /dev/rmt/0 rewind:

Shows table of contents of archive. If tar tvf produces an error, thenthere are no more records on the tape.

tar tvf /dev/rmt/0:

Advanced to the next archive on the tape.

mt -f /dev/rmt/0 fsf:

Moves the tape to the end of the last archive that it can detect.

mt -f /dev/rmt/0 eom:

Erases the tape. Use with care.

mt -f /dev/rmt/0 erase:

Ejects the tape, if the device supports that option.

mt -f /dev/rmt/0 offline:

To extract lengthy archives even if you plan to log out, use the nohupcommand as follows:

nohup tar xvf /dev/rmt/0 &

Identify the tape device

dmesg | grep st

Check the status of the tape drive

mt -f /dev/rmt/0 status

Tarring files to a tape

Page 374: Solaris Real Stuff

tar cvf /dev/rmt/0 *

Cpioing files to a tape

find . -print | cpio -ovcB > /dev/rmt/0

Viewing cpio files on a tape

cpio -ivtB < /dev/rmt/0

Restoring a cpio

cpio -ivcB < /dev/rmt/0

To compress a file

compress -v some.file

To uncompress a file

uncompress some.file.Z

To encode a file

uuencode some.file.Z some.file.Z

To unencode a file

uudecode some.file.Z some.file.Z

To dump a disk slice using ufsdump

ufsdump 0cvf /dev/rmt/0 /dev/rdsk/c0t0d0s0orufsdump 0cvf /dev/rmt/0 /export/home

To restore a dump with ufsrestore

ufsrestore rvf /dev/rmt/0

To duplicate a disk slice directly

ufsdump 0f - /dev/rdsk/c0t0d0s7 |(cd /home;ufsrestore xf -)

posted by Brahma at 9:36 AM 0 comments

Page 375: Solaris Real Stuff

How to Set a Netmask under Solaris

3.9: How to Set a Netmask under Solaris

In order to include a permanent netmask on your Solaris machine, youmust make an entry in the /etc/netmasks file, in the following format:

network-address netmask

For example:

%%%% cat /etc/netmasks150.101.0.0 255.255.255.0

The above would subnet the class B network, 150.101.0.0, into 254subnets, from 150.101.1.0 to 150.101.254.0.In 2.5.1 and below, only one entry for the entire class network is allowedto support standard subnetting as specified in RFC-950 It is important to note that the entry in the left hand column must bethe original base network number (ie #.0.0.0 for a Class A, #.#.0.0for a Class B and #.#.#.0 for a Class C), not the subnet.

The 2.6 kernel has changed to support VLSM. It is now possible to combinethe RFC-950 and RFC-1519 form of subnet masks in the netmasks file. The network address should be the "SUBNETTED" address NOT the standardnetwork number based on the hosts ip address. (see man page for netmasks)

Here are a few examples:

A host address 192.188.206.65 with a netmask of 255.255.255.224

/etc/netmasks

192.188.206.64 255.255.255.224

A host address 172.31.16.193 with a netmask of 255.255.255.192

/etc/netmasks

172.31.16.192 255.255.255.192

See section 3.10 to get a better understanding of how to subnet by bit.

posted by Brahma at 9:34 AM 0 comments

configure multiple IP addresses for a single physical interfacec.

Page 376: Solaris Real Stuff

With Solaris 2.x it is possible to configure multiple IP addresses fora single physical interfacec. This allows a machine with a singleethernet card to appear as an entire network of different machines.

SOLUTION SUMMARY:

In order to configure the lance ethernet (le0 or hme0) device to support morethan one ip address, do the following:

1. Create entries in /etc/hosts for each hostname your physical machinewill appear as.

128.195.10.31 myhost128.195.10.46 myhost2128.195.10.78 myhost3

2. Create /etc/hostname.le0:n files that contain the hostname for thevirtual host n. Note that hostname.le0:0 is the same as hostname.le0

/etc/hostname.le0 (Contains name myhost)/etc/hostname.le0:1 (Contains name myhost2)/etc/hostname.le0:2 (Contains name myhost3)

or

/etc/hostname.hme0 --> 10/100Mbit/sec high speed interface/etc/hostname.hme0:1/etc/hostname.hme0:2

The above changes will cause the virtual hosts to be configured at boottime.

You can also directly enable/modify a logical hosts configuration byrunning ifconfig directly on one of the logical hosts by using the le0:nnaming scheme.

% ifconfig le0:1 up% ifconfig le0:1 129.153.76.72% ifconfig le0:1 down

or

% ifconfig hme0:1 up% ifconfig hme0:1 129.153.76.72% ifconfig hme0:1 down

Page 377: Solaris Real Stuff

posted by Brahma at 9:33 AM 0 comments

file commands

^^ How can I create a file of arbitrary size?

1. Method 11. /usr/sbin/mkfile 10m file1. Creates a 10 Megabyte file2. For each write() operation, there is no accompanyingread() operation, making this method very efficient2. Method 2 (recommended)1. /usr/bin/dd < /dev/zero > file bs=1024 seek=10240 count=11. Creates a 10 Megabyte file2. This method does not require many reads and writessince the file is sparse

# ^^ How do I archive directories with 155+ character directory namesor 100+ character file names?

1. Solaris 2.61. Sun's version of tar does not support this, use cpio2. /usr/bin/find . | /usr/bin/cpio -o > file.cpio1. -H tar produces warning on files with theaforementioned attributes2. Solaris 7 / 8 / 9 / 101. Use the -E switch to enable extended headers

^^ How do I get access, modify, creation time of a file?

1. Access time (atime)1. /usr/bin/ls -ul filename 2. Modify time (mtime)1. /usr/bin/ls -l filename 3. Creation time1. There is no way to determine creation time in the ufs filesystem 4. Change time (ctime)1. /usr/bin/ls -cl filename2. this includes status changes (like permissions) 5. All in one (root)1. /usr/bin/ls -i filename | /usr/bin/awk \ '{print"0t"$1":ino?i"}' | /usr/sbin/fsdb -F ufs /dev/rdsk/c0t0d0s01. assumes raw device of filesystem for filename is c0t0d0s0

posted by Brahma at 9:32 AM 0 comments

Page 378: Solaris Real Stuff

Bourne Shell redirect stderror

3. ^^ How do I redirect stderr?1. Bourne Shell1. to stdout1. command 2>&1 2. to file1. command 2> file 3. to null1. command 2> /dev/null 2. C Shell1. to stdout1. command >& /dev/tty 2. to file (without affecting stdout)1. ( command > /dev/tty ) >& file 3. to null (without affecting stdout)1. a. ( command > /dev/tty ) >& /dev/null

4. ^^ How do I rename files by extension like MS-DOS?1. DOS Example: move *.doc *.txt2. Korn Shell1. for x in *.doc; do mv $x ${x%.doc}.txt; done

posted by Brahma at 9:29 AM 0 comments

Kernel info

^^

1. ^^ Where do I put kernel configuration?1. /etc/system

2. ^^ 2. How do I add more PTYs?1. Solaris 2.6 / 71. Do not attempt to do this with '/usr/bin/adb -k'.2. Modify /etc/system1. set pt_cnt=X 3. /usr/sbin/reboot -- -r 2. Solaris 8 / 9 / 101. Dynamically allocated.2. Do not attempt to do this with '/usr/bin/mdb -k'.3. Limit is forced by modifying /etc/system1. set pt_cnt=X 4. /usr/sbin/reboot -- -r

Page 379: Solaris Real Stuff

3. ^^ What is shared memory?1. Just as it sounds. Shared memory is an InterprocessCommunication (IPC) mechanism used by multiple processes to accesscommon memory segments.

4. ^^ How do I know the limits for shared memory kernel tunables?1. Read /usr/include/sys/shm.h

5. ^^ What is a semaphore?1. A non-negative integer that is incremented or decrementedrelative to available resources.

6. ^^ How do I know the limits for semaphore kernel tunables?1. Read /usr/include/sys/sem.h

7. ^^ What is a door?1. A door is a file descriptor that describes a method forinterprocess communication between client and server threads.2. A door file appears with file mode D---------.

8. ^^ add_drv(1m) fails with "add_drv/rem_drv currently busy; try later".1. /usr/bin/rm /tmp/AdDrEm.lck

9. ^^ How do I increase the number of file descriptors available toan application?1. File descriptors are used for more than just open files,they also provide the framework for socket i/o.2. The kernel dynamically allocates resources for open files.There is no maximum number of file descriptors per system.3. Depending on the programming interface used, anapplication may not be able to reach the file descriptor limit.4. API limits (ref:Solaris Internals)1. Solaris 2.61. stdio(3S) - 2561. From stdio(3S); no more than 255 filesmay be opened using fopen(), and only file descriptors 0 through 255can be used in a stream.2. select(3c) - 10241. From select(3c); The default value forFD_SETSIZE (currently 1024) is larger than the default limit on thenumber of open files. It is not possible to increase the size of thefd_set data type when used with select().2. Solaris 7 / 8 / 9 / 101. stdio(3s) - 256 (32-bit) / 65536 (64-bit)2. select(3c) - 1024 (32-bit) / 65536 (64-bit)1. 65536 limit is attainable on 32-bit Solaris 7

Page 380: Solaris Real Stuff

2. #define FD_SETSIZE 65536 ; prior to includes 5. System defaults1. Solaris 2.6 / 71. rlim_fd_max 10242. rlim_fd_cur 64 2. Solaris 81. rlim_fd_max 10242. rlim_fd_cur 256 3. Solaris 9 / 101. rlim_fd_max 655362. rlim_fd_cur 256 6. Modify /etc/system1. set rlim_fd_max=8192 ; hard limit (per process)2. set rlim_fd_cur=1024 ; soft limit (per process)

10. ^^ What is a register window?1. A register window is used by the operating system to storethe current local and in registers upon a system interupt, exception,or trap instruction.2. register windows are important to preserve the state ofthe stack between function calls.

11. ^^ What is the default memory page size?1. sun4u1. 8192 bytes 2. sun4c/sun4m/sun4d1. 4096 bytes

12. ^^ What is the current memory page size?1. /usr/bin/pagesize

13. ^^ What kind of binaries can my kernel run?1. /usr/bin/isainfo -v

14. ^^ What kind of kernel modules can my kernel run?1. /usr/bin/isainfo -kv

posted by Brahma at 9:28 AM 0 comments

How to Make A Solaris Package

How to Make A Solaris Package

1. Make a Prototype File.

Page 381: Solaris Real Stuff

A Prototype file contains an exhaustive listing of the files youwant to package up. This file also defines file permissions, groupownership, user ownership, and file type (file, directory, symboliclink, etc).

The very beginning of the prototype file is the search path, youhave to tell the pkgmk program where to find all these source files(otherwise called object files). Having the files listed in theprototype file is just not enough, you have to implicitly declare eachand every subdirectory to the files you are bundling. You must includeevery single directory in the search path!!! This will be quitetedious if you have a very nested application.

To make this a bit easier, you are allowed to declare variables,eg: !UL=/usr/local. The !UL is the variable declaration, or theshortcut for /usr/local. In order to use this variable, you have topreface the abbreviation with the $ sign, eg: $UL/bin. This is thesame as /usr/local/bin. This saves you 7 characters if you have totype this over and over again. Here is an example of the prototypeheader:

!TEB=/tmp/ebs/bin!search /tmp /tmp/ebs $TEB $TEB/docsi pkginfo=/tmp/ebs/pkginfod none /tmp 1777 root rootf none /tmp/ebs.txt 0755 user userf none /tmp/ebs/bin/myprogram.sh 0755 user userf none /tmp/ebs/bin/docs/mydoc.txt 0755 user staff

2. Now you are ready to build your prototype file.3. Figure out which software you want to package up.4. Change directory to the main directory, eg: cd /apps/some_software5. Find all files and pipe output to pkgproto command: find .-print | pkgproto > prototype6. Edit the prototype file and put in the absolute path to the file, eg:

before: f none /bin/myprogram.shafter: f none /tmp/ebs/bin/myprogram.sh

7. Go over the prototype file with a fine tooth comb and checkpermissions and ownership.8. Pay special attention to dot files, eg: .todayrc, .kshrc. Theywon't be captured for some reason. In order to get them to be includedin the package build, you have to manually add them like:

Page 382: Solaris Real Stuff

f none /apps/some_software/.adotfile=/apps/some_software/.adotfile0660 user group

9. Pay special attention to sybmolically linked directories, asthey won't be captured.10. Once you are confident you have a complete prototype file, youcan now make a description file.11. Make a pkginfo file. This is the standard file to describe yourpackage. Here is an example.

PKG="MyPackage" * Note, no funny characters like _ or, -, or !.NAME="Name"VERSION="Version 1a"ARCH="SPARC"CLASSES="none"CATEGORY="Application"VENDOR="My Company"PSTAMP="27thMay1999"EMAIL="[email protected]"ISTATES=" S s 1 2 3"RSTATES=" S s 1 2 3"BASEDIR="/"

12. Make a preinstall script (optional). This can be an actual shellscript that checks certain conditions, or in my case, just a few echocommands reminding the installer what the prerequisites for theinstall are, etc.13. Make a copyright file, eg:

- Include your special files in the prototype file, eg:i pkginfo=/tmp/pkginfoi copyright=/tmp/copyrighti preinstall=/tmp/preinstalli postinstall=/tmp/postinstall

14. Build your package by running: sudo pkgmk -o -d $PWD -f./prototype 2>err.txt15. Note, if you do not run this as root, you may not havepermission to access all files, and you will not preserve fileownership.16. No doubt you will get a few errors, so might as well redirectoutput to a log (err.txt) file.17. Inspect error log for any problems, if you have success you willhave messages like:

$ more err.txt

Page 383: Solaris Real Stuff

## Building pkgmap from package prototype file.## Processing pkginfo file.## Attempting to volumize 130 entries in pkgmap.part 1 -- 94855 blocks, 133 entries## Packaging one part./tmp/MyPackage/pkgmap/tmp/MyPackage/pkginfo/tmp/MyPackage/root/apps/my_software/DEVT/bin/16/tmp/MyPackage/root/apps/my_software/DEVT/bin/AddDev.sh/tmp/MyPackage/root/apps/my_software/DEVT/bin/AddUser.sh/tmp/MyPackage/root/apps/my_software/DEVT/s01.i/tmp/MyPackage/root/apps/my_software/DEVT/s02.d/tmp/MyPackage/root/apps/my_software/DEVT/s02.i/tmp/MyPackage/root/apps/my_software/DEVT/s13.i/tmp/MyPackage/root/apps/my_software/env/tmp/MyPackage/root/apps/my_software/tab## Validating control scripts.## Packaging complete.

18. If you have reached this step, you may now congratulate yourselffor making your First Solaris Package!19. Verify the package is present: $ ls -alrt drwxr-x--- 3 userstaff 185 May 27 10:53 MyPackage/ Yep, it is there. $ du -skMyPackage/ 47144 MyPackage Yep, it is quite big.20. This matches up with the size of the original/apps/some_software, with the exception of a few files I chose not toinclude.21. Check out what is in the package directory:

drwxr-x--- 3 user staff 106 May 27 10:53 root/-rw-rw---- 1 user staff 10429 May 27 10:53 pkgmap-rw-rw---- 1 user staff 240 May 27 10:53 pkginfo

$ ls root/apps/

$ more pkginfoPKG=MyPackageNAME=My Software 1999VERSION=Version 1aARCH=sun4uCLASSES=noneCATEGORY=ApplicationVENDOR=My [email protected]

Page 384: Solaris Real Stuff

ISTATES= S s 1 2 3RSTATES= S s 1 2 3BASEDIR=/

$ more pkgmapmore pkgmap: 1 948551 d none /apps 0755 root root1 d none /apps/some_software 0755 user staff1 d none /apps/my_software/DEVT 0770 user staff1 d none /apps/my_software/DEVT/bin 2770 user staff1 f none /apps/my_software/DEVT/bin/16 0660 user staff 0 0 9255217531 f none /apps/my_software/DEVT/bin/lasmacro.quiet 0640 user staff45091 47500 9253626341 f none /apps/my_software/DEVT/bin/move_files.orig 0550 user staff1382 35166 916612436

22. At this point in time, you are technically done. You have three options:1. keep the directory package as such.2. tar and compress the directory package.3. run pkgtrans to make package into 1 single file(DataStream Package). This is my personal preference.4. Tar your package (so that you can compress or gzip it):tar cvf MyPackage.tar MyPackage5. Compress or GZIP your package. Use compress if you are notsure if the recipient of the package does not have gzip (since it isnot native to Solaris):$ compress -v MyPackage.tar

It is probably a wise choice not to use gzip, since otherlocations (where you are sending the package), may NOT have gzip,since it is not native to Solaris. Compress is native.6. Or Skip the tar and compress stuff, and just translateyour package into 1 file ( this simply makes the directory into a cpioarchive file):sudo pkgtrans . PackageName.pkg7. pkgtrans will look for suitable packages and rename it toPackageName.pkg8. Now you can investigate, install or remove packages:* sudo pkginfo -l PackageName.pkg* sudo pkgadd -d .* sudo pkgrm

eg:

$ sudo pkgadd -d .

Page 385: Solaris Real Stuff

The following packages are available:1 MyPackage Direct Lending System 1999(sun4,sun4c,sun4m,sun4e) Today 511

Select package(s) you wish to process (or 'all' to processall packages). (default: all) [?,??,q]: 1

Processing package instance from

Direct Lending System 1999(sun4,sun4c,sun4m,sun4e) Today 511RabobankGroup Sydney## Processing package information.## Processing system information.## Verifying disk space requirements.## Checking for conflicts with packages already installed.

posted by Brahma at 9:25 AM 0 comments

Upgrading VxVM

upgrading VxVM quick and easyRather than using encapsulated rootdisk, take a look at thepossibility of a deencapsulating rootdisk and skip to Gene Trantham,section 2). This methodology is tried and true and with fewer sideeffects and better recovery options. If you still wish to haveencapsulated rootdisk, see below.

Contributed by various people:

1. If the root disk is encapsulated by veritas volume manager1. Break any root volume mirror (remove/detach mirror plexfrom rootvol. This way, you can simply boot back off of this mirrorplex if you need to get back quickly).2. Save a copy of your /etc/system file1. Comment out any "rootdevice" lines in the/etc/system file (remember to use '*' as the comment character!)3. save your VxVM volumes in vfstab (by copying the currentvfstab to another file, and copy the vfstab.prevm as vfstab; orcomment out the Veritas volumes in the vfstab file)4. remove any patches you may have applied to VxVM5. reboot to single-user mode or multi-user mode to boot ofnon-veritas initialized volumes.6. remove (pkgrm) the current Veritas (SEVM=SUNW... orVeritas=VRTS...) packages7. reboot

Page 386: Solaris Real Stuff

8. add (pkgadd) new packages from the Volume Manager cdrom9. Run vxinstall1. Do a custom install. ONLY encapsulate root disk intoarray - choose 4 to leave all other disks on all other controllersalone (or use an exclude file for the other disks)2. copy back your original (saved above) vfstab file as/etc/vfstab (or uncomment the volumes if applicable).3. Let it do its reboots10. Done: when it reboots, you should see the volumes listedin vfstab (if you have them in a diskgroup other than rootdg. If youhad them in rootdg, you're going to get warnings about rootdg numbersnot being the same.)

2. If the root disk is not encapsulated by veritas volume manager1. Save your VxVM volumes in vfstab (just in case -- bycopying the current vfstab to another file e.g. /etc/vfstab.bak)2. Remove any patches you may have applied to VxVM3. remove (pkgrm) the current Veritas (SEVM=SUNW... orVeritas=VRTS...) packages4. pkgadd the new Veritas (SEVM=SUNW... or Veritas=VRTS...) packages5. remove the '/etc/vx/reconfig.d/state.d/install-db' file(to tell Veritas it's not a fresh install).6. Apply any new patches that may be required7. reboot

Also please note that the above will work for any/all versions ofvolume manager (2.x and 3.x) but the package names may differ. ForSEVM the package names began with 'SUNW'; example: SUNWvmdev,SUNWvxvm, etc. For versions 3.x and higher or versions < 3.X boughtdirectly from Veritas and not through Sun, the package names beginwith 'VRTS'; example: VRTSvmdev, VRTSvxvm, etc. See this link for aversion cross reference.

Gene Trantham writes, in his Best Practices Paper, about making therootdg unencapsulated. When asked if this made upgrades to VxVMtricky, he responded:

The officially supported method of upgrading either VxVM or VxVM+OS isto run upgrade-start (located on the VxVM media). This script writes anew VTOC according to the geometry of the core OS volumes, whether thedisk is encapsulated or not. In fact, it computes slice start,offsetvalues at run time -- NOT from any saved VTOC or from the underlyingslices which may be present on an encapsulated disk.

So, to answer your question, the method that I propose does not makeit any harder to upgrade VxVM. But -- and this is a pet peeve of mine

Page 387: Solaris Real Stuff

-- VxVM shouldn't have to be this hard to upgrade anyway. Consider apatch for SEVM. That does not require theunencapsulate,remove,replace,re-encapsulate procedure, so why should asoftware upgrade? The only difference is the revision number of theproduct.I have successfully upgraded VxVM using two alternate approaches:

1. Manipulate the /var/sadm/install/contents file and the packagedatabase such that the VRTSvxvm and related packages are 'officially'not installed (yet the binaries are still in place). You may thenpkgadd your new VRTSvxvm, overwriting the existing binaries.

This is unlikely to ever be suported by either Veritas or Sun asan acceptable method for upgrade, so should not be attempted in thefield.2. If you have a clone OS disk or the ability to boot from anetwork boot image and still operate VxVM (such as with MR andJumpStart), you may upgrade like so:* boot from alternate media w/ VxVM drivers loaded* mount rootvol and friends on /a* chroot /a pkgrm VRTSvxvm* cd /net/whereever/new/vxvm/is* pkgadd -R /a . VRTSvxvm

This second method will be discussed with a little more detail in anupcoming BluePrint article (August [2000], I believe).

Here is a more explicit expansion of method 1. used when the rootdiskis deencapsulated following the guideliness of the best practicespaper above:

1. pkgrm VRTSvmsa VRTSvmman VRTSvmdev2. pkgrm.sh VRTSvxvm3. cd /net/wherever/new/vxvm/is4. pkgadd VRTSvxvm(reply 'y' to the overwrite question)5. Apply any patches for new VxVM that may be available6. Reboot ASAP

Caveats:

1. Upgrading VxVM only. No OS upgrade2. Little or no error trapping in the script. Could mangle thepatch and package database (but only a little bit :-)

posted by Brahma at 9:25 AM 0 comments

Page 388: Solaris Real Stuff

How to move disks between VxVM diskgroups

How to move disks between VxVM diskgroups

by Tony Griffiths

This is not easily done. However you may be able to move a volume fromone diskgroup to another, beware, some people have messed this up.Here are the basic steps

Lets say you have two diskgroups sourcedg and targetdg. The sourcedghas a volume data1 that you want in targetdg. data1 is a simple volumewith a subdisk on disk01, c1t0d0

1. backup the data in volume data1 in case this goes wrong.2. Save the VM configuration for that particular volume (don'tstore the file in the volume)* vxprint -g sourcedg -hmQq data1 > /data.file* vxdisk list > /vxdisk.file (save the disk name/device mappings)3. Unmount, stop and remove the volume data1, yes thats rightremove it! (removing a volume does not actually destroy the data onthe disks, it simply deletes the mappings of the volume/plex/subdisk)4. Remove the disks that the data1 resided on, and add them to thenew diskgroup with the same DM name.* vxdg -g sourcedg rmdisk disk01* vxdg -g targetdg adddisk disk01=c1t0d05. Rebuild the volume mapping form the saved file* vxmake -g targetdg -d /data.file6. Start the volume>* vxvol start data01

The above example is very simple as the volume sat on only onedisk. If that disk was used by other volumes that are not to be movedto the newdg , then we have a problem. You can only move the disk outof a DG, when all the subdisks are gone.I always advise people to test this before REALLY trying it.using ESMvxbu.bin by Espen Martinsen

After installing ESMvxbu.bin, you get some good scripts below/usr/vx/utils The intension of these scripts is to assist in movingselected volumes from one diskgroup to another (or a new one).

Package contents1.save_info used to save information2.remove_from_DG remove volumes from the diskgroup

Page 389: Solaris Real Stuff

3.make_new_DG Builds the volumes up again in the new DGRUNME Documentation, sorry, it's in norwegiancount_configs counts which disks have VxVM config copiesremove_disk used by remove_from_DGremove_volume used by remove_from_DG

OK, this is what you do: First: Be sure to save your entireconfig as it is:#/usr/vx/Save_All_DG /usr/vx

Then on with the job:#cd /usr/vx/utils#./1.save_info volume volume volume .......#./2.remove_from_DG volume volume volume .......(Continue to run this script until it sucseeds, It might fail,and tell you to do something, like umount a filesystem etc....)#3.make_new_DG

Now everything should be fine again, you can now edit/etc/vfstab, to switch diskgroup, and remount the filesystems or startsome databases...

remember one thing; you can't empty the entire "rootdg" youshould have something in it. I usually have what I call a "minimalrootdg"**; which only consist of a simple slice, usually 20Mb on slice7 on the root-disk.

On an existing system, do like this: (make sure you have the slice!)#sh#SLICE=c0t0d0s7#vxdctl add disk $SLICE type=simple#vxdisk -f init $SLICE type=simple#vxdg adddisk $SLICE

With a fresh install of VxVM, just do like this after the pkgadd of vxvm:#SLICE=c0t0d0s7#vxconfigd -m disable#vxdctl init#vxdg init rootdg#vxdctl add disk $SLICE type=simple#vxdisk -f init $SLICE type=simple#vxdg adddisk $SLICE#vxdctl enable#vxdctl initdmp#rm /etc/vx/reconfig.d/state.d/install-db

Page 390: Solaris Real Stuff

(No need for rebooting at all....)

I really hope you can use some of this, send me an email if you liked it!** [Ed: caveat: If you use a simple slice for rootdg, and thatdisk or slice fails, there will be no rootdg, and none of your volumeswill work anymore. VxVM requires there to be a rootdg. Also, Sun norVeritas will support this configuration. They recommend a mirror ofrootdg for this (among other) reasons. However, many people choose todo this anyway.]

Remarks on caveat:

What we are doing here is this:

A lot of places, we don't want to mirror the bootdisk. Reason:When the original bootdisk have failed and has been fixed, then bothrootdisk an rootmir are initialized, not encaptulated. This makes itdifficult for the technicians that switch disks. It's also difficultif you by some other reason needs the "original" partitioning.

Typically we do this where there is one of this conditions:1. There is a cluster/HA that takes the services to another machine2. The system can allow 10 minutes downtime.

We have a script that "clones" the bootdisk instead of mirroringit. This script sed's through all vfstab etc (also /etc/vx/volboot) (contact me for further information.)

OK, now if the "rootdisk" dies, the machine will crash (of course)But it will reboot immediatly on the secondary disk.

Where we have machines standing alone, which won't tolerate anydowntime, we often use DiskSuite to mirror bootdisks/rootdg

posted by Brahma at 9:25 AM 0 comments

Veritas (VM/FS) temp license expired

>> Resolution:> Boot single-user mode> Backup the files under /etc/vx/licenses/lic> Remove all the files in /etc/vx/licenses/lic> Run /sbin/vxlicinst -k NEW_LICENSE_KEY> Reboot server>

Page 391: Solaris Real Stuff

>> Question:> > My temp license expired for VxFS and I need to replace> > it with another temp key. The problem is that while> > attempting to install the key, it errors out with:> >> > "Error: Duplicate License key detected:"> >> > How do I remove the old key so that I can install the new temp key?> >> > Setup: Solaris 2.9 running Veritas 3.5

posted by Brahma at 9:24 AM 0 comments

Tuesday, June 21, 2005

Veritas: ID free space in a disk group

Veritas: ID free space in a disk group

There's a couple of ways to determine how much free space you haveand/or can use in a particular disk group. To print out the entirefree space, issue:

vxdg -g ${group} | grep -i '^${group}' | \

awk '{Sum += $6} END {printf ("Megs free: %.2f\n",Sum/2048)}'

That will tell you the grand total of megs free in ${group}'s diskgroup. That may or may not be the largest amount of space that you canuse. Veritas can show you what it thinks the largest volume it cancreate using the layout of your choice by executing the followingcommand:

vxassist maxsize layout=${layout}

where layout = simple | stripe | raid5

For example:

1. vxassist maxsize layout=simpleMaximum volume size: 376832 (184 Megs)

2. vxassist maxsize layout=stripeMaximum volume size: 376832 (184 Megs)

Page 392: Solaris Real Stuff

3. vxassist maxsize layout=raid5Maximum volume size: 376832 (184 Megs)

posted by Brahma at 3:18 PM 0 comments

SUN: ID full/half duplex nic status

SUN: ID full/half duplex nic statusStandard disclaimer: Use the information that follows at your ownrisk. If you screw up a system, don't blame it on me...mailto: [email protected]

To determine whether your NIC is full or half duplex, execute:

ndd /dev/hme link_mode

A 1 means full duplex; 0 means half.

posted by Brahma at 3:17 PM 0 comments

SUN: ID'ing which pkg owns a file

SUN: ID'ing which pkg owns a fileStandard disclaimer: Use the information that follows at your ownrisk. If you screw up a system, don't blame it on me...mailto: [email protected]

On rare occasions, you'll see a message during a pkgadd saying thatit's going to overwrite a file from a different package. The questionI've always had is "How to find out which package owns that file inthe first place?"

Up to now, I've always grep ${file} /var/sadm/install/contents.

As it turns out, you can also pkgchk -lp ${file}

posted by Brahma at 3:17 PM 0 comments

Filesystem Copy

Hey;

I'm involved in a very similar project at my current client. Forfilesystems, we're using a combination of find and cpio. You don'twant to use mv as that eliminates your backout plan. cp and tar

Page 393: Solaris Real Stuff

are both too slow for any serious data migration. I have a cohortthat swears by ufs|vxfsdump; however, I've not been able to getverifiable results using it. I'm sure it's something I'm messingup; however, it wasn't worth fighting over. On HP systems, usingfibre channel on EMC devices, we were getting over 200 gig per hourtransfer rate. I understand the SUN systems are getting similartransfer rates.

The exact command is:

cd ${mp}; find . -xdev -print | cpio -pdumv /mnt${mp}

HTH;

PostPosted: Mon Oct 04, 2004 9:59 pm Post subject: copy files usingtar/cpio/rsync command Reply with quoteWe can use the following tar command to copy the files.

bash:

( tar cfp - ./imap ) | ( cd /var/spool/ ; tar xfp - )

copy to remote machine using sshbash:

$ ( cd /to/directory/to/copy && tar cf - . ) |ssh user@remote_machine "(cd /directory/to/copy/to && tar xvpf - )"

Using cpio commandbash:

find . -depth | cpio -pdumv /path/tobe/copied/to

Last edited by admin on Thu Feb 03, 2005 1:46 am; edited 7 times in totalBack to top View user's profile Send private message Send e-mail Guest

PostPosted: Mon Oct 11, 2004 12:10 pm Post subject: rsync copyReply with quoteTo copy file incrementaly. we can use rsync comand

bash:

Page 394: Solaris Real Stuff

#Local copy$rsync -vrulHpogtSx /local /apps/

$rsync -vrulHpogtSx --delete /local /apps/

#Remote Copy$rsync --verbose --progress --stats --recursive 192.168.0.2::hdd/ .

Number of files: 278109Number of files transferred: 240181Total file size: 4492470951 bytesTotal transferred file size: 4463367304 bytesLiteral data: 4463366090 bytesMatched data: 1214 bytesFile list size: 5691994Total bytes sent: 4812050Total bytes received: 4480019189

sent 4812050 bytes received 4480019189 bytes 489209.84 bytes/sectotal size is 4492470951 speedup is 1.00rsync error: some files could not be transferred (code 23) at /home/lapo/packang/tmp/rsync-2.6.3/main.c(1146)

#rsync over ssh:

$/usr/bin/rsync -e ssh -avz --delete Expect-1.15 [email protected]:exce

To copy tared zipped file from remote machine.

This command transfer file faster than scp command

bash:

ssh <host> -x "cat file.tar.Z " | zcat - | tar xvf -

posted by Brahma at 3:17 PM 0 comments

Rsync

chevro...writes:> Ok, It works, thanks.

> And the magic command is :> rsync -auv --delete SrcTree/ DestTree/

Page 395: Solaris Real Stuff

Add -c occasionally to be completely sure that the files are exactly thesame. And -H for proper handling of hard links.

-- Dragan Cvetkovic,

To be or not to be is true. G. Boole No it isn't. L. E. J. Brouwer

!!! Sender/From address is bogus. Use reply-to one !!!

Reply

Dave Hinz Jun 13, 10:50 am show optionsNewsgroups: comp.unix.adminFrom: Dave Hinz Subject: Re: Trees SynchronizingReply | Reply to Author | Forward | Print | Individual Message | Showoriginal | Report Abuse

>> Ok, It works, thanks.

>> And the magic command is :>> rsync -auv --delete SrcTree/ DestTree/

> Add -c occasionally to be completely sure that the files are exactly the> same. And -H for proper handling of hard links.

And note that you can use ssl as a transport rather than "r" over port23, if your network is set up with reasonable security.

posted by Brahma at 3:16 PM 0 comments

system panic rebooting after adding patch ,looking for urgent solution pl

Re: system penic rebooting after adding patch ,looking for urgent solution pls.Posted By Brian King On Sunday, June 19, 2005 at 11:49 PM

mount /your/root/device /apatchrm -R /a 105181-39

e.g.

mount /dev/dsk/c0t0d0s0 /apatchrm -R /a 105181-39

Page 396: Solaris Real Stuff

posted by Brahma at 3:14 PM 0 comments

How can I use full-duplex ethernet?

4.13) How can I use full-duplex ethernet?

Sun's hme and later fast ethernet adaptors support full-duplex ethernet.

There are several ways of changing the default settings and forcefull-duplex mode; you may need to alter your switch settings as well.The problem with changing this setting is that it disables autonegotiation. Usually, this causes switches to fall back to half-duplexmode unless they are also configured to use full duplex mode.

It is usual best to leave the settings alone and have both switch andSun auto-negotiate unless problems arise.

Setting through /etc/system

set hme:hme_adv_autoneg_cap=0set hme:hme_adv_100hdx_cap=0set hme:hme_adv_100fdx_cap=1

Setting with ndd

ndd -set /dev/hme adv_100hdx_cap 0ndd -set /dev/hme adv_100fdx_cap 1ndd -set /dev/hme adv_autoneg_cap 0

In case you have multiple instances, you need to select the specifichme instance first, e.g., use the following to select hme1:

ndd -set /dev/hme instance 1

If you need to query the device, you can interrogate various variablessuch as ``link_status'', ``link_speed'', etc.

Setting "adv_autoneg_cap", not necessarily changing it, will causere-negotiating of link speed/duplex settings.

The dfme device cannot configured using /etc/system but are configuredeither with ndd (but on per-device nodes /dev/dfme0, dev/dfme1) or byediting dfme.conf.

posted by Brahma at 3:13 PM 0 comments

Page 397: Solaris Real Stuff

Why do swap -l, swap -s and /tmp disagree about the amount of swap

3.76) How can I grow a UFS filesystem?

You can grow but not shrink a UFS filesystem if you manage to increasethe size of the partition it lives in, with the following command:

/usr/lib/fs/ufs/mkfs -G -M /current/mount /dev/rdsk/cXtYdZsA newsize

Specifying the current mount point and raw device as well as the newsize in 512 byte blocks.

You can do this even when the filesystem is mounted and in use.

3.81) Why do swap -l, swap -s and /tmp disagree about the amount of swap?

First of all, let's get the tmpfs issue (/tmp, /var/run) out of theway. The tmpfs filesystem is a filesystem that takes memory from thevirtual memory pool. What it lists as size of swap is the sum of thespace currently taken by the filesystem and the available swap spaceunless the size is limited with the size=xxxx option.

In other words, the "size" of a tmpfs filesystem has nothing to dowith the size of swap; at most with the available swap.

The second confusing issue is what "swap" really is. Solaris definesswap as the sum total of physical memory not otherwise used andphysical swap. This is confusing to some who believe that swap is justthe physical swap space.

The "swap -l" command will list the swap devices and files configuredand how much of them is already in use.

The "swap -s" command will list the size of virtual swap. Physicalswap added to the physical memory. On systems with plenty of memory,"swap -l" will typically show little or no swap space use but "swap-s" will show a lot of swap space used.

PREV INDEX

posted by Brahma at 3:13 PM 0 comments

Acrobat Reader 7.0 Solaris

>>>I see that there is now an Adobe Acrobat Reader version 7.0 (Sparc)

Page 398: Solaris Real Stuff

on www.adobe.com.>>However I do not seem to be able to download it for some reason>>(ardownload.adobe.com not found) - has anybody else managed to>>download it ?>>> Try:> ftp://ftp.adobe.com/pub/adobe/reader/unix/7x/7.0/enu/AdbeRdr70_solaris_enu.tar.gz>> I downloaded it yesterday, after quite a period where the server> appeared to be dead.>

Got it thanks.

I see acroread is part of Solaris 10 and there is some grubby binarywrapper calledpdflaunch which I'm not sure what it does. Moving the/opt/sfw/bin/acroread to acroread5.0and adding a symbolic link to 7.0 seems to have done the trick.

I guess Sun will eventually issue a Sol 10 patch for Acrobat to takethe old 5.0 out and popthe new 7.0 in ?

posted by Brahma at 3:12 PM 0 comments

LIMITATIONS growfs

I poked around on the growfs man page and noticed this little blurb atthe bottom:

LIMITATIONS

[...] The following conditions prevent you from expanding filesystems: [....] When there is a local swap file in the target filesystem. When the file system is root (/), /usr, or swap.

I didn't realize that / and /usr were prevented from growing by growfs.Kinda throws a wrench into what Mike is trying to do.

posted by Brahma at 3:12 PM 0 comments

Questions regarding swap and /tmp

Page 399: Solaris Real Stuff

==============================================================================TOPIC: Questions regarding swap and /tmp==============================================================================

== 1 of 4 ==Date: Fri 17 Jun 2005 06:15From: suzanne.dorman

As you read this, please keep in mind that I am not a sys admin. I'm alowly developer who is the only one in this small shop who knowsanything about Unix (and that was a few years ago).

I am configuring our stand-alone Sun to accomodate Oracle 9i. One ofthe requirements is that the swap size be the same or greater than theRAM (1 GB). The other requirement is that /tmp have at least 400MB ofspace available and that /tmp does not use a device of swap.After running the swap -l command, I can see that my swap is not largeenough (it's about 500 MB).

swap -l

swapfile dev swaplo blocks free/dev/dsk/c0t0d0s1 32,1 16 1058288 682528

By doing a df -k, I can see that my /tmp does indeed use a device ofswap.

df -k

Filesystem kbytes used avail capacity Mounted on/dev/dsk/c0t0d0s0 2327951 2241035 40357 99% //proc 0 0 0 0% /procmnttab 0 0 0 0% /etc/mnttabfd 0 0 0 0% /dev/fdswap 363208 168 363040 1% /var/runswap 363368 328 363040 1% /tmp/dev/dsk/c0t0d0s7 67731034 11322365 55731359 17% /export/home/export/home/dlataille 67731034 11322365 55731359 17%/home/dlataille/export/home/oracle 67731034 11322365 55731359 17% /home/oracle

My /etc/vfstab file looks like this:

Page 400: Solaris Real Stuff

#device device mount FS fsck mount mount#to mount to fsck point type pass at boot options#fd - /dev/fd fd - no -/proc - /proc proc - no -/dev/dsk/c0t0d0s1 - - swap - no -/dev/dsk/c0t0d0s0 /dev/rdsk/c0t0d0s0 / ufs 1 no -/dev/dsk/c0t0d0s7 /dev/rdsk/c0t0d0s7 /export/home ufs 2 yes -swap - /tmp tmpfs - yes -

For reference, uname -a produced the following:

SunOS HARTFORD 5.9 Generic_117171-01 sun4u sparc SUNW,Sun-Fire-V250

My questions are:

1. Does /tmp being on a swap device mean that the swap file and /tmpshare the same space?2. How do I make /tmp no longer be on a swap device?3. I believe I have to change the /etc/vfstab file to make any changesexist after reboot. If I do that, how can I specify a swap size?4. /var/run is also on a swap device (see df -k results) and the sizeis very close to /tmp. Is the similar size a coincidence or is there areason for it?

Thanks in advance!

== 2 of 4 ==Date: Fri 17 Jun 2005 15:46From: Thomas Maier-Komor

wrote:[ snip ]

first of all I would like to recommend reading the man page of tmpfs.

>> 1. Does /tmp being on a swap device mean that the swap file and /tmp> share the same space?

tmpfs is a filesystem which allocates its space in the virtual memory.So the answert to your question is yes and no, as it will just allocatepages of virtual memory, which might eventually get pushed into swapspace.

Page 401: Solaris Real Stuff

> 2. How do I make /tmp no longer be on a swap device?

remove the entry in vfstab. I would further recommend creating anew filesystem on an unused slice and mounting that instead, becauseunmount /tmp means that tmp will reside on /. Before doing anysuch change make sure you know what you are doing, because putting/tmp on a regular filesystem can reduce performance dramatically.Furthermore you should make sure that /tmp will be clean uponboot, because it is when mounted with tmpfs on swap.

> 3. I believe I have to change the /etc/vfstab file to make any changes> exist after reboot. If I do that, how can I specify a swap size?

you can add swap by either specifying swap slices or creating swapfiles. see man page for swap

> 4. /var/run is also on a swap device (see df -k results) and the size> is very close to /tmp. Is the similar size a coincidence or is there a> reason for it?

AFAIK swap is automatically mounted on /var/run and you should leave itlike this...

>> Thanks in advance!>

Cheers,

Tom

== 3 of 4 ==Date: Fri 17 Jun 2005 15:57From: c(Andreas F. Borchert)

On 2005-06-17, suzanne.dorman> wrote:> 1. Does /tmp being on a swap device mean that the swap file and /tmp> share the same space?

Yes.

> 2. How do I make /tmp no longer be on a swap device?

Page 402: Solaris Real Stuff

Create a free disk partition, newfs it, fix the /tmp line in /etc/vfstab,and reboot or umount/mount /tmp. (In this case, you should create aprocedure that cleans up /tmp at boot time).

> 3. I believe I have to change the /etc/vfstab file to make any changes> exist after reboot. If I do that, how can I specify a swap size?

The swap size depends on the size of the swap partitions.

> 4. /var/run is also on a swap device (see df -k results) and the size> is very close to /tmp. Is the similar size a coincidence or is there a> reason for it?

This is done in /etc/init.d/buildmnttab (up to Solaris 9, somewhereelse in Solaris 10). The swap file system has the big advantage thatfirstly it does not require a separate partition and secondly it doesnot need to be cleaned up. (Besides the advantage of speed, of course).

Andreas.

== 4 of 4 ==Date: Fri 17 Jun 2005 14:50From: Rich Teer

On Fri, 17 Jun 2005 suzanne.dorman wrote:

> I am configuring our stand-alone Sun to accomodate Oracle 9i. One of> the requirements is that the swap size be the same or greater than the> RAM (1 GB). The other requirement is that /tmp have at least 400MB of> space available and that /tmp does not use a device of swap.

I'm no Oracle expert, but that last piece of advice sounds a bit bogusto me. Be very generous with your swap space allocation, sure, and limitthe size of /tmp using the appropriate mount option. But having /tmpon a ufs file system seems a bit daft to me.

> /dev/dsk/c0t0d0s0 /dev/rdsk/c0t0d0s0 / ufs 1 no -> /dev/dsk/c0t0d0s7 /dev/rdsk/c0t0d0s7 /export/home ufs 2 yes -

For those UFS file systems, you should specify the "logging" option,unless you're using the latest version of S9 (or S10) which does thisby default.

> SunOS HARTFORD 5.9 Generic_117171-01 sun4u sparc SUNW,Sun-Fire-V250

Page 403: Solaris Real Stuff

Uppercase hostnames are a bad idea, or at the very least, go againstconvention. I'd rename it to hardford if I were you (and that doesn'tcause too much disruption).

> 1. Does /tmp being on a swap device mean that the swap file and /tmp> share the same space?

Sort of. The bit of your swap space that isn't being used for virtualmemory is available to tmpfs, and vice versa.

> 2. How do I make /tmp no longer be on a swap device?

Mount /tmp on a different device. But I'd advise against that forperformance reasons.

> 3. I believe I have to change the /etc/vfstab file to make any changes> exist after reboot. If I do that, how can I specify a swap size?

Man swap.

> 4. /var/run is also on a swap device (see df -k results) and the size> is very close to /tmp. Is the similar size a coincidence or is there a> reason for it?

Yes, they share the same "space".

--Rich Teer, SCNA, SCSA, OpenSolaris CAB member

posted by Brahma at 3:11 PM 0 comments

EMC event memory

==============================================================================TOPIC: EMC event?

==============================================================================

== 1 of 3 ==Date: Fri 17 Jun 2005 10:08From: Scott Howard

Michael Tosch rote:> Dragan Cvetkovic wrote:

Page 404: Solaris Real Stuff

>> Jun 16 06:51:17 sc2 SUNW,UltraSPARC-III+: [ID 574815 kern.info][AFT0] errID 0x0001d2fa.d7fd18c0 Corrected Mtag Error on J0406 isPersistent>> Jun 16 06:51:17 sc2 SUNW,UltraSPARC-III+: [ID 214705 kern.info][AFT0] errID 0x0001d2fa.d7fd18c0 MTAG Check Bit 3 was in error andcorrected>>>> but the phrase "EMC event" returns nothing on both google and sunsolve>> (otherwise lot of hits related to EMC, but no such HW here).>> A correctable bit error occurred.

Correct. The system has ECC (Error Checking and Correcting) memory, so itboth detected and corrected the error without any impact to anything.

> Please turn to Sun Support, you will certainly get a hardware replacement.

No, you won't. Single, correctable memory errors do not indicate aproblem with the memory. If the errors continue then it may warrant areplacement, but a single error certainly doesn't.

If you want to confirm download the "cediag" tool fromhttp://sunsolve.sun.com/pub-cgi/show.pl?target=cediag which will analyseyour system and determine if any memory/CPUs need to be replaced.

Scott

== 2 of 3 ==Date: Fri 17 Jun 2005 12:35From: Michael Laajanen

HI,

> Hi,>> just noticed the following in a log file of SF280R (2 x 900MHz CPUs,> Solaris 9 Generic_112233-11):>> Jun 16 06:51:17 sc2 SUNW,UltraSPARC-III+: [ID 284628 kern.info] NOTICE: [AFT0] EMC Event detected by CPU0 at TL=0, errID 0x0001d2fa.d7fd18c0> Jun 16 06:51:17 sc2 AFSR 0x00010000<EMC>.00080000 AFAR 0x00000000.06112130> Jun 16 06:51:17 sc2 Fault_PC 0x11788e0 Msynd 0x0008 J0406> Jun 16 06:51:17 sc2 SUNW,UltraSPARC-III+: [ID 574815 kern.info] [AFT0] errID 0x0001d2fa.d7fd18c0 Corrected Mtag Error on J0406 is Persistent> Jun 16 06:51:17 sc2 SUNW,UltraSPARC-III+: [ID 214705 kern.info] [AFT0] errID

Page 405: Solaris Real Stuff

0x0001d2fa.d7fd18c0 MTAG Check Bit 3 was in error and corrected> Jun 16 06:51:17 sc2 SUNW,UltraSPARC-III+: [ID 458748 kern.info] [AFT2] errID 0x0001d2fa.d7fd18c0 PA=0x00000000.06112100> Jun 16 06:51:17 sc2 E$tag 0x00000000.18000002 E$state_4 Invalid> Jun 16 06:51:17 sc2 SUNW,UltraSPARC-III+: [ID 895151 kern.info] [AFT2] E$Data (0x00) 0x000d8633.000d8614 0x000d8600.000d85f6 ECC 0x1ab> Jun 16 06:51:17 sc2 SUNW,UltraSPARC-III+: [ID 895151 kern.info] [AFT2] E$Data (0x10) 0x000d85ea.000d85c7 0x000d8576.000d8571 ECC 0x013> Jun 16 06:51:17 sc2 SUNW,UltraSPARC-III+: [ID 895151 kern.info] [AFT2] E$Data (0x20) 0x000d8552.000d807a 0x000d805c.000d8051 ECC 0x092> Jun 16 06:51:17 sc2 SUNW,UltraSPARC-III+: [ID 895151 kern.info] [AFT2] E$Data (0x30) 0x000d8030.000d801c 0x000d8016.000d8007 ECC 0x080> Jun 16 06:51:17 sc2 SUNW,UltraSPARC-III+: [ID 929717 kern.info] [AFT2] D$ data not available> Jun 16 06:51:17 sc2 SUNW,UltraSPARC-III+: [ID 335345 kern.info] [AFT2] I$ data not available>> but the phrase "EMC event" returns nothing on both google and sunsolve> (otherwise lot of hits related to EMC, but no such HW here).>> What does above mean?>> TIA,>> Dragan>Just curious, what CPU is UltraSPARC-III+ ?

/michael

== 3 of 3 ==Date: Fri 17 Jun 2005 12:17From: Gavin Maltby

Hi

Andreas Almroth wrote:> Dragan Cvetkovic wrote:>> Michael Tosch writes:>>>>>>> Dragan Cvetkovic wrote:>>>>>>> Hi,>>>> just noticed the following in a log file of SF280R (2 x 900MHz CPUs,>>>> Solaris 9 Generic_112233-11):

Page 406: Solaris Real Stuff

>>>> Jun 16 06:51:17 sc2 SUNW,UltraSPARC-III+: [ID 284628 kern.info] NOTICE:>>>> [AFT0] EMC Event detected by CPU0 at TL=0, errID 0x0001d2fa.d7fd18c0>>>> Jun 16 06:51:17 sc2 AFSR 0x00010000<EMC>.00080000 AFAR>>>> 0x00000000.06112130>>>> Jun 16 06:51:17 sc2 Fault_PC 0x11788e0 Msynd 0x0008 J0406>>>> Jun 16 06:51:17 sc2 SUNW,UltraSPARC-III+: [ID 574815 kern.info]>>>> [AFT0] errID 0x0001d2fa.d7fd18c0 Corrected Mtag Error on J0406 is>>>> Persistent>>>> Jun 16 06:51:17 sc2 SUNW,UltraSPARC-III+: [ID 214705 kern.info]>>>> [AFT0] errID 0x0001d2fa.d7fd18c0 MTAG Check Bit 3 was in error and>>>> corrected

An EMC event is a hardware-corrected single-bit error experiencedin the "MTag" portion of the cacheline.

MTags are not actually used on a SF280R - they're only used on biggersystems that use the SSM mode: SF15K, SF12K, SF25K. Nonethelessthey should always read as zero on systems that don't use them -they're still part of the checkword (128 bits + some metadata = 144 bits)stored in memory and the ECC code protecting MTags (separate to thatprotecting the data) is still checked.

If this is an isolated event you have nothing to worry about - some bit flipsare expected from time to time in memory, and this one just happened to hitan MTag. If this is part of a pattern of behaviour you may need to replacethe memory module. This event will have been counted and since youreport no message bleating about replacement consider the aboveinformational only. In Solaris 10 it would have gone to theerror log and not appeared in messages, and the diagnosis enginewould do the Right Thing.

>>>> [snip]>>>>>>> A correctable bit error occurred.>>> Please turn to Sun Support, you will certainly get a hardware>>> replacement.

No you should not - unless this is part of a pattern.

>>>> Thanks. But whose error is that: CPU, memory, cache, ...?

Memory. But it is an "error" and not necessarily a "fault".

Page 407: Solaris Real Stuff

>> Bye, Dragan>>>> Looks to me that memory module in position J0406 have a persistent> problem, indicating the module is probably faulty/degraded.

No. The "Persistent" classification is a horrible term. It's actuallythe good case - it means that after initial event we had another lookand the error was still there. We then rewrite the memory address andcheck again, and if it is fixed we have the final classification of"Persistent" (meaning fixable!); if we could not clear it then it wouldhave been labelled "Sticky".

In Solaris 10 the terminology is relegated to the error log rather thanmessages, where hopefully it will confuse fewer people. In Solaris Express,OpenSolaris etc the terminology has been replaced and the classificationalgorithm enhanced - that will likely appear in Solaris 10 Update 1.

> If it appear only once, then it is maybe just a glitch, but if you see> more of these messages, then the memory should be replaced.

That part is true.

Gavin

posted by Brahma at 3:10 PM 0 comments

Friday, June 10, 2005

Fixing inode error on mirrored Root disks.

Subject: Followup: Fixing inode error on mirrored Root disks.

More followup here than Summary - I will summarize when I can complete thefix (about two weeks):

The process utilized last night to fix the inode error on a Mirroredmetadevice started with a boot from a Solaris Installation cdrom. Thisboot (OK> boot cdrom -s) did not work (see below messages), and thingsautomatically proceeded into a normal boot. (and with that I lost mychange window!)

The cdrom boot error messages were:

Page 408: Solaris Real Stuff

failed to read superblock.failed to read superblock.Boot load failed.

I am still researching these messages... The planned procedure to fix theinode problem looks like this:

d10=Mirrord11=SubMirrord12=SubMirror

Process Steps:1.) Init system into OBPinit 02.) Boot system off CDROM, single-user modeSolaris Install CD #1boot cdrom -s3.) Detach one of the sub-mirrors./usr/sbin/metadetach d10 d11 (detach d11 SubMirror fromd10 Mirror.)4.) Run fsck against the remaining (attached) sub-mirror.Make sure that root partition is unmounted.fsck -y -o f /dev/rdsk/c1t1d0s0 (will fsck by defaultremaining attached submirror d12.)5.) Init system into OBPsyncinit 06.) Boot system normally (note: system will come up normally, but notmirrored)boot7.) Perform a metaattach command, to attach the 'un-attached' sub-mirror.This will sync up the unattached submirror, fixing the inode problem. Maytake couple hours./usr/sbin/metattach d10 d12 (reattach & resync d12 SubMirrorto d10 Mirror.)/usr/sbin/metastat | grep Resync (check current percentageof resync completion.)8.) When completed, system is ready; Reboot optional?

My next change window is in two weeks; If the above works, I will besummarizing that way. Much thanks to all the folks that offered help onthis - I'm learning a lot!

I'm fairly new to Solaris administration... (Retread IBM Mainframe sysprog)

Page 409: Solaris Real Stuff

We have observed the following error message on one of our V1280 serversthat has a mirrored Root partition. This server is running Solaris 9,using Solaris Volume Manager for mirroring.

[ID 879645 kern.notice] NOTICE: /: unexpected free inode 427288, run fsck(1M) -o f

The gentleman that set this server up (top notch sysadmin) went on to abetter opportunity, so I am learning fast. Looks like the followingsequence of steps would get me through a correction... I would appreciateany critiquing, or pointers, from more experienced folks. Thanks inadvance - I will summarize in return.

1.) Bring down server to OBP level2.) Boot server off CDROM, single user mode (installation CDROM #1 ?)3.) Mount root partition "/dev/dsk/c1t0d0s0" on " /a " (1st submirror)4.) Run fsck on " /a "5.) umount " /a "6.) Repeat process for second submirror.7.) Reboot back into multi-user mode.

Again, thanks for any advice that might be offered.

posted by Brahma at 3:58 PM 0 comments

installing CDE packages

Subject: SUMMARY- installing CDE packages

Thanks to:

Dave FosterJonathan Birchall

There solution was to install all SUNWdt* packages and the SUNWxw* packages.

I was dealing with a customised operating environment, the chaps whowrote the jumpstart proceedure had a stripped environment loaded ontothis v120, so there was quite alot missing. The same solution workedon two of the v240 boxes I was servicing also.

Thanks chaps, nice to know there are decent people prepared to help!

SashahUnix adminVodaphone

Page 410: Solaris Real Stuff

posted by Brahma at 3:57 PM 0 comments

pkgchk -l

This is what I get:> uname -aSunOS borsen1 5.10 Generic sun4u sparc SUNW,Sun-Fire-V240# pkgchk -l -p /usr/bin/sshPathname: /usr/bin/sshType: regular fileExpected mode: 0555Expected owner: rootExpected group: binExpected file size (bytes): 257288Expected sum(1) of contents: 26512Expected last modification: jan 23 03.18.29 2005Referenced by the following packages:SUNWsshuCurrent status: installed> scp -p a oracle@host:/stage_area/aoracle@tvpe's password:scp: warning: Executing scp1 compatibility.scp: FATAL: Executing ssh1 in compatibility mode failed (Check that scp1is in your PATH). lost connection

Why do I have to enter the password ?What does the error message say to me ?

posted by Brahma at 3:57 PM 0 comments

Script for Monitoring Users in Unix Environment

RE: Script for Monitoring Users in Unix EnvironmentPosted By crazyrussian666 On Friday, June 03, 2005 at 9:29 PM

"last" command should work, also who, w, check /var/adm/sulog for pplusing su command. Also "finger" will give you some info

try rusers -l

who -urusers -llast

posted by Brahma at 3:56 PM 0 comments

Page 411: Solaris Real Stuff

reboot email script

Hi,Below a simple script I put in rc2.d to alert me when a machine rebootsName is S99reboot------------------------#!/bin/sh# just a mail when reboot occursmode=$1

if [ ! -d /usr/bin ]then # /usr not mountedexit 1fi

set `/usr/bin/id`if [ $1 != "uid=0(root)" ]; thenecho "$0: must be run as root"exit 1fihost="`uname -n`"moi="may mail adress"

case "$mode" in

'start')echo "Reboot of $host : " `date` | mailx -s $host $moi;;

'stop')echo "Stop of $host : " `date` | mailx -s $host $moi;;

*)echo " Option unknown"exit 1;;esac

exit 0

posted by Brahma at 3:56 PM 0 comments

scsiinfo

try "scsiinfo" google for it ...

Page 412: Solaris Real Stuff

-----Original Message-----

Behalf Of Aali NaserSent: Monday, June 06, 2005 10:54 AM

Subject: Re: [SunHELP] SCSI Tool

Hello All,

My question is very general. I am looking for a tool to be used in Solaris2.8env for SCSI drive troubleshooting. A tool which can tell me if there is aproblem with the drive itself or any level of inormation.

TIA

posted by Brahma at 3:55 PM 0 comments

Disappearing metadb entries

TOPIC: Disappearing metadb entries

== 1 of 1 ==Date: Mon 6 Jun 2005 16:48From: Darren Dunham

UNIX admin wrote:> This brings up an interesting design question though: I was taught in> courses to use 3 replicas per disk, however as one adds more and more> disks, this number becomes suboptimal because of quorum. What do you> think the sweet spot would be when going beyond RAID1 (add another RAID5> and three disks) and two-way mirroring?

The following appears in the SVM documentation:http://docs.sun.com/app/docs/doc/816-4520/6manpiei3?a=view

[...]To avoid single points-of-failure, distribute state database replicasacross slices, drives, and controllers. You want a majority of replicasto survive a single component failure. If you lose a replica (forexample, due to a device failure), problems might occur with runningSolaris Volume Manager or when rebooting the system. Solaris VolumeManager requires at least half of the replicas to be available to run,but a majority (half + 1) to reboot into multiuser mode.

Page 413: Solaris Real Stuff

A minimum of 3 state database replicas are recommended, up to a maximumof 50 replicas per Solaris Volume Manager disk set. The followingguidelines are recommended:

For a system with only a single drive: put all three replicas on oneslice.

For a system with two to four drives: put two replicas on each drive.

For a system with five or more drives: put one replica on each drive.[...]

posted by Brahma at 3:55 PM 0 comments

Network monitoring tools and usage

Re: Network monitoring tools and usagePosted By Rolando Quijivix On Friday, April 30, 2004 at 6:52 PM

Ranjitfor startinghttp://www2.rad.com/networks/1995/snmp/snmp.htmyou might get some useful links athttp://www.faqs.org/faqs/snmp-faq/part1/

some mib information where you might get some information for your hardwarehttp://www.snmplink.org/

best regards

----- Original Message -----From: "Subject: [networkadmin-select] Re: Network monitoring tools and usage

>> I have downloaded and installed the trial version of Solarwinds as well.> I now have both WhatsUpgold and Solarwinds installed.>> Can you please suggest some good links where I can learn more about> Network monitoring or some online tutprials. I have not been able to> find anything relevant through my search. I do not have much time to> explore on my own and so prefer some tutorial or a guide.

> Sent: Thursday, April 29, 2004 12:10 AM> To: Ranjit

Page 414: Solaris Real Stuff

> Subject: [networkadmin-select] Re: Network monitoring tools and usage>>> Of course you have selected the best but ther one more tool by which u> can monitor the network that is Solarwind which you can download from> Solarwind and other all information you require is given on site .>>> Hope this would solve your Problem.>> Regards

posted by Brahma at 3:54 PM 0 comments

VXVM : which disk is failing?

TOPIC: VXVM : which disk is failing?===========================================================================

== 1 of 2 ==Date: Mon 6 Jun 2005 03:17From: "didds"

Hi - when I run

vxstat -ff I get the following output :

FAILEDTYP NAME READS WRITESvol Export 0 0vol ExportData 11410 34420125vol data1_prd7 0 0vol data2_prd7 0 0vol data3_prd7 0 0vol data4_prd7 0 0vol log1_prd7 0 0vol master_prd7 0 0vol sysprocs_prd7 0 0vol tempdb1_prd7 0 0

and when I run vxprint -Aht I get (amongst others):

v ExportData fsgen ENABLED ACTIVE 9632978 SELECT -pl ExportData-01 ExportData ENABLED ACTIVE 9633456 CONCAT -RW

Page 415: Solaris Real Stuff

sd c1t2d1-01 ExportData-01 c1t2d1 0 2050272 0c1t2d1 ENAsd c1t2d2-01 ExportData-01 c1t2d2 0 2050272 2050272c1t2d2 ENAsd c1t3d3-01 ExportData-01 c1t3d3 0 407232 4100544c1t3d3 ENAsd c1t5d4-01 ExportData-01 c1t5d4 0 2050272 4507776c1t5d4 ENAsd c1t3d3-02 ExportData-01 c1t3d3 407232 1024128 6558048c1t3d3 ENAsd c1t3d4-01 ExportData-01 c1t3d4 0 1008 7582176c1t3d4 ENAsd disk01-01 ExportData-01 disk01 0 2050272 7583184c1t0d0 ENA

vxdisk list indicates the following

c1t4d3s2 sliced c1t4d3 rootdg online failing

(but clearly isn't part of the volume that has problems!)

how can I identify which disk in the volume with problems is theculprit?

cheers

ian

== 2 of 2 ==Date: Mon 6 Jun 2005 03:42From: "didds"

vxstat -s -ff

!!

ian

posted by Brahma at 3:53 PM 0 comments

Solaris mdb, crash, adb Command Examples

Solaris mdb, crash, adb Command Examples

Solaris mdb(1) utility replaces crash(1M) and adb(1). In fact, beginning with Solaris 9, crash and its man page are completely gone. This note lists some

Page 416: Solaris Real Stuff

case usage examples of crash and adb that I collected over the years. The commands are all applicable in mdb. Using mdb (or crash or adb) usually involves dealing with hexidecimal numbers, being able to pick the correct one out of many in output, and applying the cryptic adb macros on them (only a small number of macros, shown in Appendix, need no addresses or symbols). Without enough examples, you would never learn how to use these powerful tools. This note does not introduce you to these tools, which can be accomplished by reading man pages and references [Panic] (Chapters 12,13) and [Garden]. If you don't know or forget, at least remember commands $q (quit) and $c (stack trace).

Examples in [Filesystems]

[p.111] crash -> proc | grep myprogram -> user [proc slot, 1st column of proc output] -> file [number following F under OPEN FILES...], which shows file ref cnt, fs type, proc offset in file, flags=read

[p.138] crash, vnode -l [value under crash file cmd ADDR column]

[p.139] crash, od -x [value under crash file cmd ADDR column]. Output is:hexnum1: hexnum2 hexnum3 hexnum4 hexnum5hexnum6: hexnum7 hexnum8 hexnum9 hexnum10vfsp stream pages type

[p.139] adb -k[pages, hexnum9 above]$<pages shows file ops offset 0 meaning process at file beginning.

[p.144-5] crash, as -f [proc slot] -> [number under DATA column on a row of segvn_ops]$<segvn shows file ops offset and vp (vnode pointer) -> [vp from above]$<vnode -> [data from above]$<inode shows number, which is inode plus directory offset.

[p.151] p -f [proc slot] shows as (address space) -> [as from above]$<as -> [segs from above]$<seglist -> [data from above]$<segvn -> [vp from above]$<vnode -> [data from above]$<inode shows inode number

Examples in [Panic]

[p.39-40] adb -k -w /dev/ksyms /dev/mem -> rootdir/W 0 -> ls /Never do that unless you need a system panic!

[p.55] adb, $<threadlist

[p.81] adb, $<utsname, hw_provider/s, architecture/s, srpc_domain/s

Page 417: Solaris Real Stuff

[p.83] adb -k /dev/ksyms /dev/mem, time/Y, lbolt/X -> [time output]-([lbolt output]%0t100)=Y shows the boot time. Can also do: time/D, lbolt/D -> 0t[time output]-0t[lbolt output]=Y

[p.84] adb, *panicstr/s

[p.85] adb, $<msgbuf

[p.112] adb, rootfs$<bootobj, swapfile$<bootobj, dumpfile$<bootobj

[p.137,142] adb, $<proconcpu (a macro written by authors showing which process on which CPU)

[p.271] adb, [ADDR from ps -l]$<proc -> [pidp (used to be a typo pipd) from above]$<pid

[p.272-6] adb, [ADDR from ps -l]$<proc2u shows u area. At least on Solaris 10 x86, it doesn't show ofile (open files). -> [ofile from above] $<file -> [vnode from above]$<vnode -> [stram from above]$<stdata -> [wrq from above]$<queue -> [next from above]$<queue, [qinfo from above]$<qinit

[p.279] [qinfo from above]/X shows struct for ldterm.

[p.280] [first from above $<queue output]$<mblk, [rptr from above],[wptr-rptr]/C shows streams module output.

[p.317] [2nd arg of _trap() if shown in $c]$<regs -> [pc from above]?i (/i if not from core file)

[p.346] adb, $<traceall (Need to verify)

[p.352] adb, <sp$<stacktrace (Need to verify)

[p.390] adb, $<modules shows more module info than modinfo(1M).

[p.402] adb, $<cpus

[p.423-4] adb, [owner of addr$<inode]$<proc -> [uarea from above]$<u

Examples in [Garden]

[p.566] crash, dis[I'll update this section later...]

Other Examples

Page 418: Solaris Real Stuff

adb, $<cpus -> [lwp from above]$<lwp, [thread from above]$<thread

[Rodney,http://groups.google.com/groups?selm=37422E37.F4260028%40microworld.com]ps -elp, adb -k /dev/ksyms /dev/mem -> [ADDR from above]$<proc -> [tlist from above]$<thread -> [sp from above]$c, [WCHAN from ps output]$<mutex

REFERENCES

[Panic] Chris Drake, Kimberly Brown, "Panic! UNIX System Crash Dump Analysis", 1995, PTR-PH.[Garden] Berny Goodheart, James Cox, "The Magic Garden Explained: the Internals of UNIX System V Release 4", 1994, Prentice Hall.[Filesystems] Steve Pate, "UNIX Filesystems", 2003, Wiley.

APPENDIX

$ cat noaddrmacros.ksh#!/usr/dt/bin/dtksh#noaddrmacros: prints names of all adb macro that don't need an address, i.e. call by $<macro instead of addr$<macro

cnttotal=0cntnoaddr=0cd /usr/lib/adbfor i in *; doif [[ $(file $i | awk '$2~/ascii/{print "ismacro"}') == "ismacro" ]]; thencnttotal=$((cnttotal + 1))if [[ $(head -1 $i | awk '$1!~/^\./ {print "noaddr"}') == "noaddr" ]]; thencntnoaddr=$((cntnoaddr + 1))echo $ififidoneprint -r "Total macros: $cnttotal; No-address macros: $cntnoaddr"$ ./noaddrmacros.ksh #On Solaris 10 x86audiotracebuflistbuflist.nxtbuflistiter.nxtcalloutsce_rxbufhist.nxtce_rxcomphist.nxtce_txhist.nxtce_txhist.nxt1cglist

Page 419: Solaris Real Stuff

cglist.nxtcglistchk.nxtcglistiter.nxtcpu_dptbl.nxtcpusdispqtrace.listdispqtrace.nxtill_g_headsinodelistinodelist.nxtinodelistiter.nxtkmastatmajor2snode.nxtmodulesmodules.briefmodules.brief.nxtmodules.nxtmountmsgbufnca_conntracenca_doortracenca_nodetracepanicbufphyint_listsetproc.nopsleepq.nxtslpqtraceslpqtrace.listslpqtrace.nxtsvcpool_listsystemdumpthreadlisttraceall.nxtu.sizeofutsnamevv_callvfslistv_procTotal macros: 911; No-address macros: 49

So 95% of macros need an address or symbol as starting address. [Panic p.110-1] tells us a trick to find the symbol. Suppose you're debugging kernel and want to use the bootobj macro. Since this macro needs a starting address or symbol, let's find the symbol in kernel:nm /dev/ksyms | grep -i boot | grep OBJT

Page 420: Solaris Real Stuff

That doesn't find the object named *boot*. Thencd /usr/include/sysfind . -exec grep -i bootobj {} /dev/null \;That finds the definition of bootobj struct in bootconf.h, which also says rootfs, dumpfile and swapfile are of this type. These three variables also exist in `nm /dev/ksyms`. So you can use them as symbols for bootobj macro, e.g. rootfs$<bootobj, dumpfile$<bootobj.

The 5% macros that don't need an address won't necessarily work on core files. For example, $<threadlist works on a system crash dump or live system (adb -k /dev/ksyms /dev/mem), but it doesn't work on a process core (adb [path/executable] core).

posted by Brahma at 3:52 PM 0 comments

strings command

STRINGS - OLD BUT A GOODIE

To check content of an objectfile(binary file ) we can't use vior cat command, for that use stringscommand.

% strings [name of binary file]

It will print all the printable stringspresent in object file. Basicallystrings command looks for ASCII stringsin executable file and print it.

Great for core files and other binaryerror files

posted by Brahma at 3:51 PM 0 comments

disksuite metadb

Be aware that the optimal replica configuration may be different fromthe one presented here, in the case that the data disks are also undercontrol of Solaris Volume Manager software and can contain statedatabase replicas.

Replica creation for the two-disk configuration (and disabling the quorum rule).

Page 421: Solaris Real Stuff

metadb -a -f -c 2 c0t0d0s7metadb -a -c 2 c0t1d0s7echo set md:mirrored_root_flag=1 >> /etc/system

If one of the disks is broken, Solstice DiskSuite software stops theboot process. This reason for this is that, to successfully boot aftera failure, the majority of State Replica Databases must "survive." Onour two-disk mirror only three out of the six replicas would havesurvived.

Servers with more than two disks should always have a third diskconfigured with State Database Replicas. Twodisk workstations have tobe fixed manually, as described in the following example:

" State Database Replicas are configured on c0t0d0s7 and c0t1d0s7,disk c0t0d0 fails.

" The boot process stops with a Solstice DiskSuite error message andswitches to Single User mode.

" Now the State Database Replicas on the failed disk have to beunconfigured: metadb -d -f c0t0d0s7

" After the next reboot, partition the replacement disk exactly as inthe surviving disk.

" Having done that, you now can create new State Database Replicas onthe replacement disk: metadb -a -c 3 c0t0d0s7 "

Upon the next reboot, the mirror will be resynchronized.

Hint: If you have a two-disk system, create three State DatabaseReplicas on one disk, and four on the other one. Statistically thereis a 50 percent chance to successfully reboot after a failure withoutmanually using metadb. If you can tolerate unattended reboots after adisk failure in a two-disk configuration, create the same number ofState Database Replicas on both disks and create this /etc/systementry: set md:mirrored_root_flag=1. More information on this topic canbe found in the Sun BluePrints" publication "Configuring Boot DisksWith Solaris Volume Manager Software," (see:http://www.sun.com/blueprints/1002/817-0407-10.pdf).

posted by Brahma at 3:51 PM 0 comments

pkgchk

Page 422: Solaris Real Stuff

Dragan Cvetkovic wrote:>> pkgchk is your friend>> pkgchk -v <package name> lists all files in that package>> pkgchk -l -p <file name> shows the package name for a file etc etc.>

Is there a way to check for deps while installing or removing packages?

posted by Brahma at 3:50 PM 0 comments

adding disks under solaris

Adding Disks under Solaris

Once the disk has been physically installed, the system shouldrecognize a new device on the SCSI bus. After powering up the system,hold down the Stop key (on some Suns, this is labeled L1), and hit theA key to enter the boot monitor.

At the boot monitor, probe-scsi can be used to list the SCSI devicesthe system recognizes:

Type 'go' to resumeType help for more informationok probe-scsi..Target 5Unit 0 Disk HP C37245 5153..

Note: on some older Suns, it may be necessary to enter "n" at the bootmonitor to enter the newer command mode before probing for disks.

After verifying that the new disk is recognized by the system, rebootthe machine by issuing "boot -r" from the boot monitor. The -r optiontells the system to reconfigure for the new device.

During the boot process, the new disk should be recognized and amessage should be printed to the console. (On some Suns, it may not beprinted to the screen, but will be written to the system log -- in

Page 423: Solaris Real Stuff

this case, the dmesg command should be used to review the bootmessages). The messages should be similar to this:

sd5 at esp0: target 5 lun 0sd5 is /iommu@f,e0000000/sbus@f,e0001000/espdma@f,400000/esp@f,800000/sd@5,0WARNING: /iommu@f,e0000000/sbus@f,e0001000/espdma@f,400000/esp@f,800000/sd@5,0(sd5):corrupt label - wrong magic numberVendor 'HP', product 'C3724S', 2354660 512 byte blocks

In this example, the disk is located on controller 0, SCSI ID 5. The"corrupt label" warning means that the disk doesn't have a Solarislabel on it yet.Device nodes

The correct device nodes for the disk are automatically added when a"boot -r" is issued. If the system hasn't been rebooted using the -roption, here is a script that will configure the system for the newdisk.Formatting, Partitioning and Labeling

The format utility is used to format, partition, and label disks. Itis menu driven. The raw disk device is given as an argument; if noargument is given, format will print a list of available disks and askthe user to pick one.

# format /dev/rdsk/c0t5d0s2selecting /dev/rdsk/c0t5d0s2[disk formatted]

FORMAT MENU:disk - select a disktype - select (define) a disk typepartition - select (define) a partition tablecurrent - describe the current diskformat - format and analyze the diskrepair - repair a defective sectorlabel - write label to the diskanalyze - surface analysisdefect - defect list managementbackup - search for backup labelsverify - read and display labelssave - save new disk/partition definitionsinquiry - show vendor, product and revision

Page 424: Solaris Real Stuff

volname - set 8-character volume namequit

Typing format at the prompt will perform a low-level format on thedisk. This is usually not necessary with a new disk, since theygenerally come pre-formatted, but may help to map out any additionaldefects the drive may have developed.

The next step is to partition the drive. Type partition at the promptto switch to the partition menu:

format> partition

PARTITION MENU:0 - change `0' partition1 - change `1' partition2 - change `2' partition3 - change `3' partition4 - change `4' partition5 - change `5' partition6 - change `6' partition7 - change `7' partitionselect - select a predefined tablemodify - modify a predefined partition tablename - name the current tableprint - display the current tablelabel - write partition map and label to the diskquit

Type in print to get a listing of the current partition table. Notethat the second partition represents the entire disk:

partition> printCurrent partition table (original):Total disk cylinders available: 3361 + 2 (reserved cylinders)

Part Tag Flag Cylinders Size Blocks0 unassigned wm 0 0 (0/0/0) 01 unassigned wm 0 0 (0/0/0) 02 backup wu 0-3360 1.12GB (3361/0/0) 23527003 unassigned wm 0 0 (0/0/0) 04 unassigned wm 0 0 (0/0/0) 05 unassigned wm 0 0 (0/0/0) 06 unassigned wm 0 0 (0/0/0) 07 unassigned wm 0 0 (0/0/0) 0

Page 425: Solaris Real Stuff

We will be splitting the disk up into two equal partitions, numbers 3and 4. The first partition will span cylinders 0 through 1680, thesecond will span cylinders 1681 through 3360. The partition size canbe specified in blocks, cylinders, or megabytes by using the b, c, andmb suffixes when entering the size.

partition> 3Part Tag Flag Cylinders Size Blocks3 unassigned wm 0 0 (0/0/0) 0

Enter partition id tag[unassigned]: Enter partition permission flags[wm]: Enter new starting cyl[0]: 0Enter partition size[0b, 0c, 0.00mb]: 1680cpartition> 4Enter partition id tag[unassigned]: Enter partition permission flags[wm]: Enter new starting cyl[0]: 1681Enter partition size[0b, 0c, 0.00mb]: 1680c

Once the disk has been partitioned, the label should be written to the disk:

partition> labelReady to label disk, continue? y

The new partition table can be printed from the format utility, or maybe viewed using the prtvtoc command:

# prtvtoc /dev/rdsk/s0t5d0s2* /dev/rdsk/c0t5d0s2 partition map** Dimensions:* 512 bytes/sector* 140 sectors/track* 5 tracks/cylinder* 700 sectors/cylinder* 3363 cylinders* 3361 accessible cylinders** Flags:* 1: unmountable* 10: read-only** Unallocated space:* First Sector Last* Sector Count Sector

Page 426: Solaris Real Stuff

* 1176000 700 1176699** First Sector Last* Partition Tag Flags Sector Count Sector Mount Directory2 5 01 0 2352700 23526993 0 00 0 1176000 11759994 0 00 1176700 1176000 2352699

Creating new filesystems

Finally, new filesystems can be created on the disk using the newfscommand, and each filesystem is checked for integrity using fsck:

# newfs /dev/rdsk/c0t5d0s3newfs: construct a new file system /dev/rdsk/c0t5d0s3: (y/n)? y/dev/rdsk/c0t5d0s3: 1176000 sectors in 1680 cylinders of 5 tracks,140 sectors574.2MB in 105 cyl groups (16 c/g, 5.47MB/g, 2624 i/g)super-block backups (for fsck -F ufs -o b=#) at:32, 11376, 22720, 34064, 45408, 56752, 68096, 79440, 89632, 100976, 112320,123664, 135008, 146352, 157696, 169040, 179232, 190576, 201920, 213264,224608, 235952, 247296, 258640, 268832, 280176, 291520, 302864, 314208,325552, 336896, 348240, 358432, 369776, 381120, 392464, 403808, 415152,426496, 437840, 448032, 459376, 470720, 482064, 493408, 504752, 516096,527440, 537632, 548976, 560320, 571664, 583008, 594352, 605696, 617040,627232, 638576, 649920, 661264, 672608, 683952, 695296, 706640, 716832,728176, 739520, 750864, 762208, 773552, 784896, 796240, 806432, 817776,829120, 840464, 851808, 863152, 874496, 885840, 896032, 907376, 918720,930064, 941408, 952752, 964096, 975440, 985632, 996976, 1008320, 1019664,1031008, 1042352, 1053696, 1065040, 1075232, 1086576, 1097920, 1109264,1120608, 1131952, 1143296, 1154640, 1164832,# fsck -y /dev/rdsk/c0t5d0s3** /dev/rdsk/c0t5d0s3** Last Mounted on ** Phase 1 - Check Blocks and Sizes** Phase 2 - Check Pathnames** Phase 3 - Check Connectivity** Phase 4 - Check Reference Counts** Phase 5 - Check Cyl groups2 files, 9 used, 551853 free (13 frags, 68980 blocks, 0.0% fragmentation)

posted by Brahma at 3:50 PM 0 comments

LDfLags

Page 427: Solaris Real Stuff

> When I compile Samba for Solaris, I always have to link> /usr/local/lib/libiconv.so.2 to /usr/lib/ first, otherwise I get a> dreaded swat error in my web browser griping about not finding> libiconv.so.2

Your environment requires LDFLAGS so that proper configure scripts can setup the Makefiles correctly.

For libraries in /usr/local/lib only:

export LDFLAGS='-L/usr/local/lib -R/usr/local/lib'

Please read the man pages for ld and ld.so.1. The purpose of the -Rargument is well described there.

I just looked into this issue, with a different application. InSolaris 9 and 10, and likely earlier Solaris versions, the iconvfunctions are in libc. There's no need for a separate libiconv.

The configure script may be confused if you have libiconv installed.I'd recommend removing it. You may also be able to tell configurenot to use it.

This is on Solaris 9:

$ nm /usr/lib/libc.so | grep iconv[4769] | 274488| 56|FUNC |GLOB |0 |9 |_iconv[3886] | 274400| 88|FUNC |GLOB |0 |9 |_iconv_close[3517] | 272724| 180|FUNC |GLOB |0 |9 |_iconv_open[4307] | 274488| 56|FUNC |WEAK |0 |9 |iconv[1185] | 0| 0|FILE |LOCL |0 |ABS |iconv.c[4672] | 274400| 88|FUNC |WEAK |0 |9 |iconv_close[4959] | 272724| 180|FUNC |WEAK |0 |9 |iconv_open[1193] | 273640| 200|FUNC |LOCL |0 |9 |iconv_open_all[1197] | 274040| 360|FUNC |LOCL |0 |9 |iconv_open_private[1194] | 273840| 200|FUNC |LOCL |0 |9 |iconv_search_alias

--

posted by Brahma at 3:49 PM 0 comments

loss of fiber channel veritas

BTC wrote:> Hi,>

Page 428: Solaris Real Stuff

> One of my servers briefly lost its fiber channel connection to its 3510> array (don't ask)!>> The connection was restored quickly, but not before vxvm got itself in a> knot. (See the output of vxdisk and vxprint below).>> I'm not sure how to recover from this. I have been through the Veritas> doco but that doesn't seem to cover a situation like this.>> I have seen a few references to using "vxdctl enable", but I am> concerned that this might make matters worse. Considering I have a few> terabytes of data on these disks I'm not keen on having to restore from> tape. So will "vxdctl enable" do anything "bad" like dropping the> configs for my volumes?>> Any assistance would be greatly appreciated.>> Cheers,>> BTC>> # vxdisk list> DEVICE TYPE DISK GROUP STATUS> c1t0d0s2 sliced rootdisk rootdg online> c1t1d0s2 sliced nsr00 nsr online> c1t2d0s2 sliced rootdg00 rootdg online> c1t3d0s2 sliced nsr01 nsr online> c1t4d0s2 sliced datadg00 datadg online> c1t5d0s2 sliced datadg01 datadg online> fabric_0 sliced - - online> fabric_1 sliced - - online> fabric_2 sliced - - online> fabric_3 sliced - - online> fabric_4 sliced - - online> fabric_5 sliced - - online> fabric_6 sliced - - online> fabric_7 sliced - - online> fabric_8 sliced - - online> fabric_9 sliced - - online> fabric_10 sliced - - online> fabric_11 sliced - - online> - - datadg02 datadg failed was:fabric_0> - - datadg03 datadg failed was:fabric_1> - - datadg04 datadg failed was:fabric_2> - - datadg05 datadg failed was:fabric_3> - - datadg06 datadg failed was:fabric_4

Page 429: Solaris Real Stuff

> - - datadg07 datadg failed was:fabric_5> - - datadg08 datadg failed was:fabric_6> - - datadg09 datadg failed was:fabric_7> - - datadg10 datadg failed was:fabric_8> - - datadg11 datadg failed was:fabric_9> - - datadg12 datadg failed was:fabric_10> - - datadg13 datadg failed was:fabric_11>>> # vxprint -g datadg> TY NAME ASSOC KSTATE LENGTH PLOFFS STATE TUTIL0> PUTIL0> dg datadg datadg - - - - -> ->> dm datadg00 c1t4d0s2 - 143328960 - - -> -> dm datadg01 c1t5d0s2 - 143328960 - - -> -> dm datadg02 - - - - NODEVICE -> -> dm datadg03 - - - - NODEVICE -> -> dm datadg04 - - - - NODEVICE -> -> dm datadg05 - - - - NODEVICE -> -> dm datadg06 - - - - NODEVICE -> -> dm datadg07 - - - - NODEVICE -> -> dm datadg08 - - - - NODEVICE -> -> dm datadg09 - - - - NODEVICE -> -> dm datadg10 - - - - NODEVICE -> -> dm datadg11 - - - - NODEVICE -> -> dm datadg12 - - - - NODEVICE -> -> dm datadg13 - - - - NODEVICE -> ->> v Admin fsgen DISABLED 37748736 - ACTIVE -> -

Page 430: Solaris Real Stuff

> pl Admin-01 Admin DISABLED 37755520 - NODEVICE -> -> sd datadg02-02 Admin-01 DISABLED 37755520 0 NODEVICE -> ->> v CDBurning fsgen ENABLED 142606336 - ACTIVE -> -> pl CDBurning-01 CDBurning ENABLED 142606464 - ACTIVE -> -> sd datadg00-01 CDBurning-01 ENABLED 142606464 0 - -> ->> v Computing fsgen DISABLED 209715200 - EMPTY -> -> pl Computing-01 Computing DISABLED 209715264 - NODEVICE -> -> sd datadg01-01 Computing-01 ENABLED 142606464 0 - -> -> sd datadg02-07 Computing-01 DISABLED 67108800 142606464 NODEVICE -> ->> v Dump fsgen DISABLED 2097152 - EMPTY -> -> pl Dump-01 Dump DISABLED 2116928 - NODEVICE -> -> sd datadg02-06 Dump-01 DISABLED 2116928 0 NODEVICE -> ->> v Finance fsgen DISABLED 48234496 - ACTIVE -> -> pl Finance-01 Finance DISABLED 48237568 - NODEVICE -> -> sd datadg12-01 Finance-01 DISABLED 48237568 0 NODEVICE -> -

vxdctl enable can be run even if the vxconfigd is running in enabledmode.It looks for new devices that are added since vxconfigd was laststarted.

You can use /etc/vx/bin/vxreattach -c c?t?d?. If it displays thediskgroup and disk name you can then use /etc/vx/bin/vxreattach -rc?t?d?

snip from man vxreattach

Page 431: Solaris Real Stuff

/etc/vx/bin/vxreattach -c c1t2d0

If reattachment is possible, vxreattach returns with an exitstatus of 0 and displays the disk group name and disk medianame. If reattachment is not possible, vxreattach returnsan exit status of 2 and displays an error.

Attempt to reattach the disk in the foreground and try torecover stale plexes of any volumes on the disk:

/etc/vx/bin/vxreattach -r c1t2d0

If the reattachment is successful, vxreattach returns anexit status of 0. Otherwise, if an error occurs, vxreattachreturns a non-zero exit code as defined on vxintro(1M).

HTHS

posted by Brahma at 3:48 PM 0 comments

Interpretation of patchadd -p output

Hi Martin:pca is a great tool! I downloaded the patchdiag.xref (because of wgetand proxy do not work well) and ran ./pca -i > installed.txt or ./pca-u > uninstalled.txt and checked the patch number. Much better thanpatchadd -p :).Thank you very much.Mike

Martin Paul wrote:> MikeHT wrote:> > Does patch 111177-06 need to be added or is the server uptodate with> > the patch on SunOS 5.8? Thank you in advance.>> patch 111177-06 has been obsoleted by patch 108827-15, which in> turn has been obsoleted by 108993-18. You have 108993-39 installed,> so you do not need to install 111177-06.>> A tool like pca ( http://www.par.univie.ac.at/solaris/pca/ ) can> help you analyze such issues. Try:>> wget http://www.par.univie.ac.at/solaris/pca/pca> chmod +x pca> ./pca -x

Page 432: Solaris Real Stuff

>> If the report of missing patches doesn't show 111177, you don't need it.>> mp.> --

posted by Brahma at 3:48 PM 0 comments

Listing suspended and background processe

Listing suspended and background processes

When a process is running, backgrounded or suspended, it will beentered onto a list along with a job number. To examine this list,type

% jobs

An example of a job list could be

[1] Suspended sleep 100[2] Running netscape[3] Running nedit

To restart (foreground) a suspended processes, type

% fg %jobnumber

For example, to restart sleep 100, type

% fg %1

Typing fg with no job number foregrounds the last suspended process.

posted by Brahma at 3:47 PM 0 comments

history

history

The C shell keeps an ordered list of all the commands that you haveentered. Each command is given a number according to the order it wasentered.

% history (show command history list)

Page 433: Solaris Real Stuff

If you are using the C shell, you can use the exclamation character(!) to recall commands easily.

% !! (recall last command)

% !-3 (recall third most recent command)

% !5 (recall 5th command in list)

% !grep (recall last command starting with grep)

You can increase the size of the history buffer by typing

% set history=100

Previous

posted by Brahma at 3:46 PM 0 comments

running software

Running the software

You are now ready to run the software (assuming everything worked).

% cd ~/units174

If you list the contents of the units directory, you will see a numberof subdirectories.bin The binary executablesinfo GNU info formatted documentationman Man pagesshare Shared data files

To run the program, change to the bin directory and type

% ./units

As an example, convert 6 feet to metres.

You have: 6 feet

You want: metres

* 1.8288

Page 434: Solaris Real Stuff

If you get the answer 1.8288, congratulations, it worked.

To view what units it can convert between, view the data file in theshare directory (the list is quite comprehensive).

To read the full documentation, change into the info directory and type

% info --file=units.info7.7 Stripping unnecessary code

When a piece of software is being developed, it is useful for theprogrammer to include debugging information into the resultingexecutable. This way, if there are problems encountered when runningthe executable, the programmer can load the executable into adebugging software package and track down any software bugs.

This is useful for the programmer, but unnecessary for the user. Wecan assume that the package, once finished and available for downloadhas already been tested and debugged. However, when we compiled thesoftware above, debugging information was still compiled into thefinal executable. Since it is unlikey that we are going to need thisdebugging information, we can strip it out of the final executable.One of the advantages of this is a much smaller executable, whichshould run slightly faster.

What we are going to do is look at the before and after size of thebinary file. First change into the bin directory of the unitsinstallation directory.

% cd ~/units174/bin

% ls -l

As you can see, the file is over 100 kbytes in size. You can get moreinformation on the type of file by using the file command.

% file units

units: ELF 32-bit LSB executable, Intel 80386, version 1, dynamicallylinked (uses shared libs), not stripped

To strip all the debug and line numbering information out of thebinary file, use the strip command

% strip units

Page 435: Solaris Real Stuff

% ls -l

As you can see, the file is now 36 kbytes - a third of its originalsize. Two thirds of the binary file was debug code !!!

Check the file information again.

% file units

units: ELF 32-bit LSB executable, Intel 80386, version 1, dynamicallylinked (uses shared libs), stripped

HINT: You can use the make command to install pre-stripped copies ofall the binary files when you install the package.

Instead of typing make install, simply type make install-strip

posted by Brahma at 3:45 PM 0 comments

Friday, June 03, 2005

disk IO

Have you tried "iostat/sar" to see on which disk huge number of I/Ooperation are executed. "iostat -P" can even give report for each diskpartition. Then you may decide which application program causes that basedon your knowledge of applications on your server.

Use "serv" (service time, in fact it's response time) in their report as anindicator of disk I/O performance, if it keeps great than 30 (ms), you maythink about to distribute I/O operations on different disk, use disk mirrorand etc.

If a thread (or process) is waiting for oneor more outstanding i/o's to complete, it isnot using ("occupying" to use your term) anyCPU. It is suspended.

Iowait means precisely that the CPU is available.

posted by Brahma at 3:50 PM 0 comments

IOWait

I wrote a doc on this a bit over a year ago, have a read of

Page 436: Solaris Real Stuff

http://sunsolve.sun.com/search/document.do?assetkey=1-9-75659-1

Because of all of the misunderstanding associated with IOwait, it is

defined to be 0 in Solaris 10.

The really important thing to take away from that document is that

IOWait is a subset of idle. You only get IOwait time if there is nothing

else ready to run from the dispatch queues.

alan.

posted by Brahma at 3:49 PM 0 comments

watch command monitoring

Monitoring system events in real-time is essential for troubleshootingand debugging. Most Unix and Linux admins are already familiar with psand top. Add the watch command to your toolkit as well. watch lets youobserve the results of running a single command repeatedly, with allof your chosen options. For example, to observe activity on an NIC inreal time:

$ watch -d /sbin/ifconfig eth0

Ctrl+c stops it. The -d flag highlights the changes. The default is torerun the command every two seconds. This can be changed with the -nflag, which in this example is set for every second:

$ watch -n 1 -d /sbin/ifconfig eth0

To monitor memory usage:

$ watch -d free

Keep an eye on the printer queue with:

$ watch lpq

And monitor CPU temperature:

$ watch cat /proc/acpi/thermal_zone/THRM/temperature

Page 437: Solaris Real Stuff

In other words, you may watch any command in real-time. Consult thewatch man page to lean more.

posted by Brahma at 3:48 PM 0 comments

Tuesday, May 24, 2005

Checking I/o process

Dan <[email protected]> wrote:> Hi, in the way that unix keeps a running total of the amount of CPU> time a process has used, does it also keep track of i/o throughput?

> I have a server that is busy with i/o, and i need to work out with PID> is using all the i/o. I know i can work it back with which disks are> busy, and which processes are accessing those disks, but that does not> show which of the processes is actually doing the reads.

> So, looking for top or prstat, but for i/o

http://users.tpg.com.au/bdgcvb/psio.htmlhttp://www.stormloader.com/yonghuang/freeware/pio.html

posted by Brahma at 3:18 PM 0 comments

kill sunray session

:%Is there a way to cleanly kick off a sun ray user when his session:%is disconnected, as root ?

:%On a workstation, I usually broke the screen saver with the root:%password and logged the user out manually.

:%On the Sun Ray server (with NSCM), there are no screen locks.:%It would be nice if root was able to connect to any Sun Ray:%session by e.g. logging in with the user's login and root's password:%instead.

:%I know I can kill the users Xsession process, but this was always:%the last resort, as a user might have open windows with unsaved:%changes.

We are going to put an RFE in for this such that there will be an optionthat the root password will allow the user to connect to an arbitraryusers session. Hopefully, one that is disconnected.

Page 438: Solaris Real Stuff

Currently, the cleanest way to kill a Sun Ray session is withutsession -k.

posted by Brahma at 3:17 PM 0 comments

Firefox on CDE control Panel

Hi;To make a default browser, Claude Charest suggestedthe following (at sunman mail list):

cd /usr/dt/binln -s /where_is_the_new_browser_prg (ln -s/appl/firefox/firefox)AND MODIFY sdtwebclient like this:X> sun# diff sdtwebclient.orig sdtwebclientX> 5c5X> < DEFAULT_BROWSER_LIST="netscape mozillasun_netscape netscape6"X> ---X> > DEFAULT_BROWSER_LIST="firefox netscape mozillasun_netscape netscape6"

Christopher Singleton wrote:

>Greetings,>>> I'm running a fully patched Solaris 9 on a SB 150. I just>downloaded Firefox and set it up in my /opt directory, and would like>to use the browser icon to launch firefox and not the default netscape>(I'm using CDE, which is needed for one of my apps). I've tooled>around a bit in /etc/dt and usr/dt, but can't find exactly what I>need to modify to get it to launch firefox when I click on the CDE>browser icon. Any help is greatly appreciated. Thanks,>

posted by Brahma at 3:17 PM 0 comments

solaris root shell no shell

Back when I was working with some Solaris servers about 3 years ago, wehad some problems which were traced back to one of the administratorssetting the default root shell to ksh.KSH was the default shell in AIX and HPUX, of course Linux default shellwas bash.

Page 439: Solaris Real Stuff

There there was undesirable boot behavior when you set root DEFAULT shellto anything but borne shell. What we ended up having to do was the lasttwo lines on the profile weresleep 60/usr/bin/ksh

if you dont run into any trouble. Great, if you run into particularlystrange boot behavior change ROOT default shell to bourne shell and see ifthat fixes it.

Three administrators and I worked this issue, finally took it up with asenior Sun engineer. The senior Sun engineer at that time who explainedthat you DON'T set default shell for root to anything but borne shellbecause bootup requires borne shell.

posted by Brahma at 3:16 PM 0 comments

Thursday, May 19, 2005

Can't Login As Root From A Remote Terminal ? Telnet / FTP

Can't Login As Root From A Remote Terminal ?

When trying to login to the Solaris server by Super-user account viathe networking, getting: "Not on system console" and "Connectionclosed by foreign host" messages, then was expelled from the system.

Modify the login file, must be logged on as root:

# vi /etc/default/login

Comment out the /dev/console line:

Looks like:

~ CONSOLE=/dev/console

Should look like:

~# CONSOLE=/dev/console

Exit vi and save the file:

:wq!

Page 440: Solaris Real Stuff

Top

I Can Telnet To A Box As Root, But Not ftp As Root

The following error message is returned:

'530 Login incorrect. Login failed.'.

Ftp for users other than root work just fine.

An automatic security auditing program created a /etc/ftpusers file,with root as the single entry. Removing root from /etc/ftpusers solvesthe problem.

Modify the /etc/ftpusers file, must be logged on as root:

# vi /etc/ftpusers

Comment out the root entry:

Looks like:

root

Should look like:

# root

Exit vi and save the file:

:wq!

Top

posted by Brahma at 10:18 AM 0 comments

No Shell For Root

One of the most common problems users request assistance for is theloss of root access shortly after making changes to the roots defaultshell. Users like to manually change the shell of account root totheir shell of choice. The easiest mistake to make is to change thedefault shell "/sbin/sh" to "/sbin/ksh". The next time you try tologin in as root you will out that you are unable to log on as rootbecause there is "No shell". The only way to fix the problem is tochange root shell from /sbin/ksh to /usr/bin/ksh in /etc/passwd. Since

Page 441: Solaris Real Stuff

the /etc/passwd is owned by root and you can not log on as root anymore, how do you solve the problem ?

The problem is caused because the user does not know there are noother shells but Bourne Shell in /sbin. Therefore the system can notfind /sbin/ksh and you fail to log on the root because of "No shell"./sbin/sh is a hard copy under root (/) file system to make the systemusable even before the file system /usr is mounted.

Boot the System using the Solaris CD and Installation Boot Disk:

Insert the Solaris x86 CDROM in the driveBoot with the installation diskette (or correct Driver Update Bootdiskettes)

Boot from CDROM:

Select the cdrom to boot when presented with the available devices

[ ] Disk: Target 0[x] CD : Target 0

Press the <F2> key to continue.

Boot to Single user mode from CD:

When asked to select Interactive or Jumpstart installation typethe following:

Select type of installation: boot -s

Press the <enter> key to continue.

Note: You have approxiamately 30 seconds to make this decisionbefore the system boots into interactive installation.

Mount the root drive:

# mount /dev/dsk/cxdxsx /a # for IDE drives, replace x withvalue that reflects your drive

or

# mount /dev/cxtxdxsx /a # for SCSI drives, replace xwith value that reflects your drive

Page 442: Solaris Real Stuff

Set your terminal mode for editing:

# TERM=at386;export TERM

Edit the passwd file and correct the path to the shell account:

# vi /a/etc/passwd

Line looks like:

root:x:0:1:Super-User:/:/sbin/ksh

When completed should look like:

root:x:0:1:Super-User:/:/usr/bin/ksh

Exit vi and save the passwd file:

:wq!

Un-mount the partition and reboot system:

# cd /# umount /a# init 6

posted by Brahma at 10:17 AM 0 comments

Creating a Solaris 10 x86 jumpstart server on a Netra t1 running Solaris 9

Message: 5Date: Wed, 18 May 2005 15:35:21 +0100From: Loris Serena Content-Type: text/plain; charset="us-ascii"

Thank you very much to

Chad MynhierDana SparlingBill Williams

whose replies (below) have shed a good technical light on the issue.

The way I got this working though is slightly different and not as elegant,I'm afraid.(or, in plain English, a quick and dirty trick!)

Page 443: Solaris Real Stuff

As far as I could see, on the Solaris 10 x86 cd 1of4 there is no Bootinformation, either as a directory or as a ufs filesystem.

# prtvtoc /dev/rdsk/c1t2d0s2* /dev/rdsk/c1t2d0s2 partition map** Dimensions:* 512 bytes/sector* 35632 sectors/track* 1 tracks/cylinder* 35632 sectors/cylinder* 1 cylinders* 1 accessible cylinders** Flags:* 1: unmountable* 10: read-only** Unallocated space:* First Sector Last* Sector Count Sector* 1215280 18446744073708336336 18446744073709551615** First Sector Last* Partition Tag Flags Sector Count Sector Mount Directory0 0 00 0 1215280 12152792 0 00 0 1215280 1215279 /cdrom#

So, I put the Solaris 10 x86 DVD into my wintel laptop, gzipped the Bootdirectory, scp-ed Boot.zip across onto my Netra t1 and unzipped it into/jsboot .

Then on the Netra, mounted the Solaris 10 x86 cd 1of4, cd into/cdrom/Solaris_10/Tools and ran./setup_install_server -t /jsboot/Boot /export/install/sol10x86

Et voil`, job done! ;-)(plus running the add_to_install_server from cd 2/3/4 of course)

Cheers

Loris

------ Chad's email ----------------------

Page 444: Solaris Real Stuff

The first cd (1 of 4) contains two filesystems, an hsfs filesystem and a ufsfilesystem, this latter being the /cdrom/Solaris_10/Tools/Boot subtree. Youcan't mount an x86 ufs filesystem on SPARC and vice versa because ofbyte-endian issues.

The only real solution is to install Solaris x86 onto an x86 server directlyfrom CD, set up jumpstart on the x86 server, and then use rsync to copy thejumpstart setup to the Netra.

Note: When copying the jumpstart setup, rsync is the tool to use. Using asimple tar pipe or even rdist won't work, you won't be able to boot off thisimage. Rsync will do the right thing with all of the special files.

Chad Mynhier------------------------------------------

------ Dana's email ----------------------I had a similar problem when I setup my Solaris 9 jumpstart box.I believe the information you need is on slice 1, so you need to mount itand use the -t option to point to it.This is what I did:Mount the cd# mount -r -F hsfs /dev/dsk/c0t6d0s0 /mntMake slice1 mountable and mount it:# lofiadm -a /dev/dsk/c0t6d0s1 /dev/lofi/1# mkdir /s1# mount -o ro /dev/lofi/1 /s1# cd /mnt/Solaris_9/Tools# ./setup_install_server -t /mnt/export/install/sparc9Unmount and delete /dev/lofi/1 when you are finished:# umount /s1# lofiadm -d /dev/lofi/l# rmdir s1

Your mileage may vary, or solaris 10 might be completely different.Bayly Eley has written an interesting document on jumpstart, and it can befound at:http://www.giac.org/certified_professionals/practicals/gcux/0263.phpI hope this helps.------------------------------------------

------ Bill's email ----------------------Hi Loris,

I don't have a Netra, but I'm guessing it's a sparc platform rather than anx86. Assuming that to be true ...

Page 445: Solaris Real Stuff

I don't have Solaris10, but do run Solaris9 and have been playing aroundwith the cross-platform jumpstart. My remarks are all based on Solaris9(4/03) SPARC and X86; however, I suspect that the requirement holds true forSolaris10.

In the case of Solaris9 (and prior releases I think)It seems that the trick is that you must mount the 1st installation CD onits NATIVE platform.In my case I have an Sol9 x86 that I want to use to jumpstart SunFire(SPARC) machines. I had to do this:

SPARC: (we'll call it SPARCSYSTEM)# Insert the Sol9 CD1 (of 2) and let it mount (volcheck).share -F nfs -o ro,anon=0 /cdrom/cdrom0/s0share -F nfs -o ro,anon=0 /cdrom/cdrom0/s1

X86:mkdir /tmp/s0 /tmp/s1mount SPARCSYSTEM:/cdrom/cdrom0/s0 /tmp/s0mount SPARCSYSTEM:/cdrom/cdrom0/s1 /tmp/s1

cd /tmp/s0/Solaris_9/Tools# Just like it says in the book:./setup_install_server -t /tmp/s1 /export/sparc9install

# Finishes...cd /tmpumount /tmp/s0umount /tmp/s1

SPARC:unshare /cdrom/cdrom0/s0unshare /cdrom/cdrom0/s1eject

All this is pretty much right out of the Solaris 9 Installation Guide (IG).The IG then goes on to describe mounting (Sol9, remember) the SPARC CD2 of 2directly on the x86 system and finishing up by adding the rest of theinstallation software right there on the x86:./add_to_install_server /export/sparc9install

Mr. Paranoid, here, did it -- SPARC mount, install from NFS -- just likeCD1.

I have read various remarks (opinions?) about *why* you have to mount/runthe CD1 from the native platform; suffice it to say that the consistant

Page 446: Solaris Real Stuff

opinion seems to be that you DO have to mount the Sol9 CD1 on it's NATIVEarchitecture.

One more thing (Sol9):You WILL have to mount the CD1 and export it from SPARC, thenmount and run setup_install_server from x86.I (foolishly) tried to export the x86 install area(sparc9install) and do the mount and setup_install_server onthe SPARC.Guess what: setup_install_server will not accept an NFS as theinstallation target.

Hope some of this helps.---------------------------------------------

-----Original Message-----From: Loris SerenaSent: 18 May 2005 11:28

Subject: Creating a Solaris 10 x86 jumpstart server on a Netra t1 runningSolaris 9.

Sunmanagers,

I have a Netra t1 (with a cdrom, not a dvd) running Solaris 9 which I'vebeen tasked to build as a jumpstart server for Solaris 10 x86.

I've mounted Solaris 10 x86 cd 1of4, cd into /cdrom/Solaris_10/Tools and runthe following:

# ./setup_install_server /export/install/sol10x86ERROR: Install boot image /cdrom/Solaris_10/Tools/Boot does not existCheck that boot image exists, or use [-t] tospecify a valid boot image elsewhere.#

Has anyone found a way around this or can point me to the right direction?

Thanks in advance

Loris

posted by Brahma at 10:16 AM 0 comments

Open files df -k pfiles lsof

Page 447: Solaris Real Stuff

Date: Wed,May 18 2005 2:43pmFrom: Ade Fewings

BL wrote:> Hi all,> My solaris 9 box running veritas volume manager 4.0.> A file system display wrong "df -k" although I have deleted lots of big> files.> The usage of that file system is incorrect. That means, suppose I only used> 10% space, but> the "df -k" shows that the file system has used 60%. After reboot, the> problem seems solved.> What's the reason? and, what command I can fix the problem if I don't want> to reboot?> Thanks>>

Probably some processes still running that have open file handles tostuff you believe has been deleted. It won't go until the file handlesall go, which happens at a reboot. 'lsof' and 'pfiles' may help you out.

Ade

posted by Brahma at 10:14 AM 0 comments

only root can print

==============================================================================TOPIC: only root can printhttp://groups-beta.google.com/group/comp.unix.solaris/browse_thread/thread/eac24e9f06b1ff01==============================================================================

== 1 of 1 ==Date: Wed,May 18 2005 12:35pmFrom: sheila

I saw this once about a decade ago.Permissions on dev/null were goofed up by the install of the jetdirectsoftware. I think dev/null is supposed to be root:sys 666Make sure you're looking at the perms of the actual file dev/nullpoints to on your system and not must the permissions on the link.

Page 448: Solaris Real Stuff

I also recall that the lpsched log file needed to be writable by theuser trying to print or there'd be no printout & no error message.Maybe yer not even using lpsched these days.

Like I said, this was a really long time ago on an older OS.

Hope this helps.SW

wolfgang mair wrote:> Hi all,>> I'm experiencing a weird problem here.>> I'm on solaris 9 and the print jobs are spooled local and then sendto a> hp jetdirect box.>>> If I try to print as a normal user:>> $ banner hallo | lp -d hp5mis> request id is hp5mis-7632 (standard input)>> The print job turns up in the lpstat -t command for a few seconds and

> then vanishes. I also can see the job in the /var/lp/logs/requests.It> looks really like it was sent to the printer, but it wasn't. (Can'tsee> anything with snoop)> Also couldn't find anything in the /var/adm/messages.>>> And, if I do the same as root, everything looks the same, but itworks.>> Any ideas?>> Thank you>> Wolfgang

posted by Brahma at 10:13 AM 0 comments

SVM mirroring technique volume manager

Page 449: Solaris Real Stuff

Dan Foster wrote:> On our V20Z with 2x73 GB drives, a SVM resync after replacing a failed> disk takes about 75 minutes on fast disks.>> We are curious why this is so, even though there is only 500 MB> allocated for the filesystems and 1 GB swap.>> So we're wondering if SVM resync touches every single physical block> of the metadevice, regardless of allocated to data (in use) or not?>> -Danyes it syncs every block of a partition.it dont care if it's data or non data./jorgen

posted by Brahma at 10:12 AM 0 comments

Wednesday, May 18, 2005

Performance Testing

Tried the dd with block sizes of up to 4096k and there was not muchdifference.

e.g.

time dd bs=64k if=/disk1/abigfile of=/dev/null10002+1 records in10002+1 records outreal 2m52.13suser 0m0.04ssys 0m14.87s

time dd bs=2068k if=/oraback100n0/backup/DB1DV/RMAN/HO>309+1 records in309+1 records outreal 3m48.01suser 0m0.00ssys 0m8.20s

And the performance makefiles is a bundle of fun. Have a look at these:

time mkfile 100m gruntreal 0m22.89suser 0m0.03ssys 0m4.00s

Page 450: Solaris Real Stuff

time mkfile 650M grunt

real 3m17.82suser 0m0.21ssys 0m25.64s

With the results of 4MB/s it's good enough to say that something iswrong :) It would be better if you add 'bs=1048576' to your 'dd' command.And, if you can afford it, test the device (/dev/dsk/c?t?d?s?) insteadof the filesystem.

You might try running 'iostat -xtcn 1' when you do the test, to see howfast the devices are really accessed at a given time.

posted by Brahma at 3:36 PM 0 comments

Hardware diagnostics

I guess this is the one you are looking for:/usr/sbin/prtdiag

But these might also be interesting/usr/sbin/prtconf/usr/sbin/prtpicl/usr/sbin/prtfru/usr/sbin/devfsadm

Try,

/usr/platform/`uname -m`/sbin/prtdiag

ok test-all

/opt/SUNWvts/bin/sunvts

posted by Brahma at 3:35 PM 0 comments

Check open files

Probably some processes still running that have open file handles tostuff you believe has been deleted. It won't go until the file handlesall go, which happens at a reboot. 'lsof' and 'pfiles' may help you out.

posted by Brahma at 3:34 PM 0 comments

Page 451: Solaris Real Stuff

Tuesday, May 17, 2005

SM new host network setup static

Hi all

I've just got my hands on an Ultra5 and have installed Solaris 10 on it.I'm very new to sparcs & Solaris, though I am familiar with linux.Networking seems to be configured partially, except the box doesn't seem tobe seeing our nameservers. I can ping IP addresses, but names are notgetting resolved, and I get "unknown host". I've amended the/etc/resolv.conf file, but no joy.

Can anyone give me a couple of hints of where I need to look? Or failingthat, any really good online resources for Solaris?

Perhaps try:

# cp /etc/nsswitch.conf /etc/nsswitch.conf.orig# cp /etc/nsswitch.dns /etc/nsswitch.conf

> Can anyone give me a couple of hints of where I need to look? Or> failing that, any really good online resources for Solaris?

There's also http://docs.sun.com

Something funny is going on because if I do an ifconfig hme0, it reports myIP as being 152.57.31.230 and my broadcast addr as being 152.57.255.255.This is wrong, they should be 152.71.37.230 & 152.71.255.255."hostname.hme0" has the correct IP in.

ifconfig hme0 152.71.37.230

route add default 152.71.0.66

Is this a static IP or DHCP'd assignment?

If static, then double check the settings of:

/etc/hosts/etc/hostname.hme0/etc/inet/netmasks/etc/defaultrouter

Could it be you have "071" instead "71" in hostname.hme0?71 in octal is 57 in decimal...

Page 452: Solaris Real Stuff

Static IP. Here's the contents of "hosts" -

## Internet host table#127.0.0.1 localhost152.71.37.230 yultra-01 loghost152.71.0.66

Here's contents of "defaultrouter"

152.71.0.66

Here's contents of "hostname.hme0"

yultra-01

Here's contents of "netmasks"

## The netmasks file associates Internet Protocol (IP) address# masks with IP network numbers.## network-number netmask## The term network-number refers to a number obtained from the InternetNetwork# Information Center.## Both the network-number and the netmasks are specified in# "decimal dot" notation, e.g:## 128.32.0.0 255.255.255.0#152.71.0.0 255.255.000.000

> #> # Internet host table> #> 127.0.0.1 localhost> 152.71.37.230 yultra-01 loghost> 152.71.0.66

Hmm, that last line should have a hostname attached to it.

Other than that... if you're still stuck after fixing /etc/hosts:

Page 453: Solaris Real Stuff

# find /etc -type f|xargs ls -l|grep ^-|awk '{print $NF}'|xargs grep '152\.'

See if anything mentions unexpected 152.x IPs or subnets.

If none of that pans out, you *could* run sys-unconfig from the consolebut that's as a *final* resort and is almost never needed.

(That command undoes a lot of things, mostly relating to networkingconfiguration... but also includes root password zeroing and a few othernon-networking-related stuff.)

It's probably better to not do sys-unconfig and see if you can figure itout by fixing /etc/hosts or doing the 'find', or try other suggestionsthat others may have.