Jeff Wasilko's Blog: 2011

Wednesday, November 2, 2011

Monitoring disk space usage on Sun Fishworks (7000) ZFS Storage Appliances

We needed a way to have our existing monitoring system alert us if a project was running out of space. There's not a single CLI command that will show all projects, but this bit of ECMAscript 3will output an easily parsed table:

 script  
 //  
 // jwasilko@gmail.com  
 // fishworks' cli user interface doesn't provide a good way to monitor  
 // disk space of all projects. This is an attempt to make up for that.   
 //  
 run('shares');  
 projects = list();  
 printf('%-40s %-10s %-10s %-10s\n', 'SHARE', 'AVAIL', 'USED', 'SNAPUSED');  
 for (i = 0; i < projects.length; i++) {  
     run('select ' + projects[i]);  
     shares = list();  
         for (j = 0; j < shares.length; j++) {  
         run('select ' + shares[j]);  
         share = projects[i] + '/' + shares[j];  
         used = run('get space_data').split(/\s+/)[3];  
         avail = run('get space_available').split(/\s+/)[3];  
         snap = run('get space_snapshots').split(/\s+/)[3];  
         printf('%-40s %-10s %-10s %-10s\n', share, avail, used, snap);  
         run('cd ..');  
     }  
     run('cd ..');  
 }

Tuesday, March 22, 2011

Celerra datamover group file doc bug

We're testing NFSv4 which requires a user/group database (either local files or LDAP/NIS) on the datamover.

Username/UID mapping was working properly, but group/GID mapping was not.

The Celerra Naming Services (6.0) doc on page 21 lists the format of the group file as:

groupname:gid:user_list

But the proper format includes a field for the group password:

groupname:password:gid:user_list

The password field is often blank (x).

Hope this helps someone else avoid the hassle we ran into.

Monday, March 7, 2011

Celerra top talkers & suspicious ops defined

The EMC Celerra datamovers have the ability to log statistics about top talkers, which can be useful for tracking down problems. We run server_stats with these options to get top talker stats:

/nas/bin/server_stats server_2 -top nfs -i 5 -c 60

One thing worth noting is there's a column labeled "NFS Suspicious Ops". There's no documentation on this column, and it took EMC some time to dig up the answer. Here it is:

SUSPICIOUS EVENTS:
One of the TopTalker output columns lists Suspicious Ops/second.
"Suspicious" events are any of the following, which are typical of the patterns seen when viruses or other badly behaved software/users are attacking a system:

CIFS events:

ACCESS_DENIED returned for FindFirst
ACCESS_DENIED returned for Open/CreateFile
ACCESS_DENIED returned for DeleteFile
SUCCESS returned for DeleteFile
SUCCESS returned for TruncateFile (size=0)

NFSv2/v3/v4 events:

NFSERR_ACCES returned for NFS OPEN/LOOKUP/CREATE/DELETE
NFSERR_ACCES returned for READDIR/READDIRPLUS
NFS_OK for NFS REMOVE
NFS_OK for NFS SETATTR (size=0)

Saturday, January 1, 2011

Monitoring share-based replication on Sun Fishworks (7000) Appliances

We use Sun/Oracle's Fishworks (7000) ZFS Storage Appliances to store our Oracle archive logs and to replicate them to our DR datacenter.

We generate more than 2TB of archive logs per day, and ZFS' compression helps knock that down to a somewhat more manageable 500GB a day. Initially we were using project-based replication which was easy to configure, but unfortunately there was not enough parallelism to keep up with our change rate.

Sun suggested setting up replications for each share (we have 16 shares per database cluster) to improve throughput. It's worked well, but the user interface didn't provide an overview of replication status.

Fortunately, the CLI can be scripted using JavaScript, so it was easy to loop over the projects and shares and extract the replication status.

To run the script, just ssh to the appliance and redirect stdin from the script:

 ldap1{jwasilko}64: ssh sun7310-1 < replication_status  
 Pseudo-terminal will not be allocated because stdin is not a terminal.  
 Password:   
 Current time: Sun Jan 02 2011 02:35:06 GMT+0000 (UTC)  
 Share             LastSync                 LastTry                 NextTry                   
 db/archivelogs_rman10     Sun Jan 02 2011 02:25:13 GMT+0000 (UTC) Sun Jan 02 2011 02:25:13 GMT+0000 (UTC) Sun Jan 02 2011 02:55:00 GMT+0000 (UTC)  
 db/archivelogs_rman12     Sun Jan 02 2011 02:26:13 GMT+0000 (UTC) Sun Jan 02 2011 02:26:13 GMT+0000 (UTC) Sun Jan 02 2011 02:56:00 GMT+0000 (UTC)  
 db/archivelogs_rman14     Sun Jan 02 2011 02:27:13 GMT+0000 (UTC) Sun Jan 02 2011 02:27:13 GMT+0000 (UTC) Sun Jan 02 2011 02:57:00 GMT+0000 (UTC)  
 db/archivelogs_rman16     Sun Jan 02 2011 02:28:13 GMT+0000 (UTC) Sun Jan 02 2011 02:28:13 GMT+0000 (UTC) Sun Jan 02 2011 02:58:00 GMT+0000 (UTC)  
 db/archivelogs_rman2      Sun Jan 02 2011 02:21:21 GMT+0000 (UTC) Sun Jan 02 2011 02:21:21 GMT+0000 (UTC) Sun Jan 02 2011 02:51:00 GMT+0000 (UTC)  
 db/archivelogs_rman4      Sun Jan 02 2011 02:22:13 GMT+0000 (UTC) Sun Jan 02 2011 02:22:13 GMT+0000 (UTC) Sun Jan 02 2011 02:52:00 GMT+0000 (UTC)  
 db/archivelogs_rman6      Sun Jan 02 2011 02:33:18 GMT+0000 (UTC) Sun Jan 02 2011 02:33:18 GMT+0000 (UTC) Sun Jan 02 2011 03:03:00 GMT+0000 (UTC)  
 db/archivelogs_rman8      Sun Jan 02 2011 02:24:13 GMT+0000 (UTC) Sun Jan 02 2011 02:24:13 GMT+0000 (UTC) Sun Jan 02 2011 02:54:00 GMT+0000 (UTC)

The script is below. I hope it might be useful for you.

 script  
   
 //  
 // jwasilko@gmail.com  
 // fishworks' user interface doesn't provide a good way to monitor  
 // the health of share-based replication. this is an attempt to make  
 // up for that.  
 //  
   
   
 print("Current time: " + new Date());  
 printf('%-30s %-40s %-40s %-40s\n', "Share", "LastSync", "LastTry", "NextTry");  
   
   
 // Get the list of projects, to iterate over later  
 run('shares');  
 projects = list();  
   
 // for each project, list the shares  
 for (projectNum = 0; projectNum < projects.length; projectNum++) {  
  run('select ' + projects[projectNum]);  
  shares = list();  
   
  // Walk into the share and select replication, then actions  
  for (sharesNum = 0; sharesNum < shares.length; sharesNum++) {  
   try { run('select ' + shares[sharesNum]) } catch (err) { dump(err); }  
   share = projects[projectNum] + '/' + shares[sharesNum];  
   run('replication');  
   actions = list();  
   
   // Some shares may not have share-specific replication actions,  
   // so skip if needed. Otherwise, get the replication status  
   if ( actions.length > 0 ) {  
    for (actionsNum = 0; actionsNum < actions.length; actionsNum++) {  
     try { run('select ' + actions[actionsNum]) } catch (err) { dump(err); }  
     lastsync = run('get last_sync').split(/=/)[1];  
     lastsync = lastsync.replace(/\n/,"");  
     lasttry = run('get last_try').split(/=/)[1];  
     lasttry = lasttry.replace(/\n/,"");  
     nextupdate = run('get next_update').split(/=/)[1];  
     nextupdate = nextupdate.replace(/\n/,"");  
     printf('%-30s %-40s %-40s %-40s\n', share, lastsync, lasttry, nextupdate);  
    }  
   run('cd ../..');  
   }  
   else {  
    run('cd ..');  
   }  
   run('cd ..');  
   }  
 run('cd ..');  
 }