net.digest - September 2001

net.digest – September 2001

With HPWorld 2001 carving a week out of August, you might think the number of off-topic posts would be down. You might think that, but you would be wrong. There were several long threads about the "Code Red" virus with most people in favor of bringing back medieval torture for convicted virus writers. A question about a ping problem, that was really a DNS problem, morphed into a discussion about the names of ships of discovery. Readers learned that while the US has not named a spacecraft after Darwin’s Beagle we have named a Lunar module after the world’s most famous beagle, Snoopy! There was also a long thread about slide rules - that’s right, slide rules. I guess my favorite though was the reference to a satire on HP employees entitled "Loyal Employees A Valuable Asset, So Now Is A Good Time To Sell Them".

But I digress as usual. There was still a lot of very good technical advice delivered on HP3000-L in August. Some that caught my eye follows.

As always, I would like to hear from readers of net.digest and Hidden Value. Even negative comments are welcome. If you think I’m full of it or goofed, or a horse's behind, let me know. If something from these columns helped you, let me know. If you’ve got an idea for something you think I missed, let me know. If you spot something on HP3000-L and would like someone to elaborate on what was discussed, let me know. Are you seeing a pattern here? You can reach me at john@burke-consulting.com.

The moral of this story is you never know where the next nugget of useful information will turn up.

As if I needed another reason to recommend people subscribe to and read HP3000-L (or at least "Hidden Value" and "net.digest" in the 3000 NewsWire), let me comment on something that happened to me recently. I run two instances of Apache/iX on two separate HP e3000s (you can see one of them at www.burke-consulting.com). These are both basically documentation servers so having search capability is important. Fortunately for me, Lars Appel ported ht://Dig to MPE/iX some time ago. I installed ht://Dig on both servers and have been happily using them, except for one nagging problem. During the search process, ht://Dig would time out every 120 or so pages. After several timeouts, it would start up again and be fine for another 120 or so pages. Traffic is light on both servers and I have not experienced any other issues in the more than one year both web servers have been running. Still, every time I re-indexed, this timeout issue would bug me. After all, it meant half a dozen pages or more were not being indexed. And I just knew that the next time I was looking frantically for an answer to some problem, it would be contained in one of the documents that did not get indexed.

A month or so ago I was reading a posting about a sockets application. This even though I've never done any sockets programming and the posting did not seem to have any relevance to any current projects. I came across the following:

"If I recall correctly, sockets can't be reused after they are closed until 4 minutes later (2 * MSL. MSL = Maximum Segment Lifetime). So, if you cycle through a lot of connections really fast, then you may be waiting for whatever is left of the 4 minute timeout for the first sockets that were closed. Take a look in NMMGR at MAX TCP CONNECTIONS and set it to the max (4096 I think)."

If you were in my office when I read this, you would probably have seen the little light bulb go on over my head. Since both servers are lightly used, I had not bothered to change any of the network parameters from the default. Sure enough, MAX TCP CONNECTIONS was set to 128 on both servers. I upped this to the maximum of 4096, changed the scope of my search so that there would be a larger than normal number of pages and fired off ht://Dig. The result? I was able to index over 10,000 pages without a single timeout. So, you just never know where that next golden nugget will turn up.

Avoid shooting yourself in the foot the next time you’re rummaging around in NMMGR

NMMGR is something that every system manager needs to get comfortable with but few really do. It is the Swiss Army Knife of MPE networking. Like the Swiss Army knife, many of its features are only rarely used. In "Hidden Value" this month there is a question about determining the network configuration of an HP e3000. The answer I chose is simple to apply and quite straightforward. What is not obvious is that NMMGR is actually being run to display the configuration, but in a manner that is completely safe. In the original thread for this question, a number of people suggested just rummaging around in the NMMGR screens to collect the network configuration information. However, as one person pointed out, it is easy to inadvertently change something while rummaging around in NMMGR, the ramifications of which you may not discover for some time. Worse, you think you’ve just been looking around when you get to exit and find the program telling you that you need to VALIDATE. What did you change? You probably have no idea and now you are stuck.

Several people offered suggestions on how to make use of NMMGR for informational query less heart stopping. John Burke (hey that’s me) suggested using the "Write access password" feature. You are much less likely to accidentally change something if you have to enter a password first to gain write access. No password and you can view anything, but change nothing. Doug Werth suggested making a copy of NMCONFIG and using the copy for inquiry. Note that the first screen only defaults to NMCONFIG.PUB.SYS, you can change it to your copy and open your copy instead. It also helps to practice on a copy of NMMGR before making changes to the real NMCONFIG. Either approach will make working with NMMGR a much less nerve-racking experience.

Is there a point where a file does get written to disk for sure?

We’ve all wondered about this at one time or another - usually after a system abort. This came up as an addendum to a discussion about what happens to a new flat file that is being written to when a system abort occurs. [Hint: it must be attached to the transaction manager to survive.] Anyway, I’m going to quote nearly verbatim the response from Kevin Cooper of the HP CSY Performance Team:

"I asked one of our file system experts about this, and here is his reply:

"According to the code, at close time, any change to a file that has been done beyond the EOF value that was written to the file label, will be posted to disk. MPE will not proceed with the close operation until the data between the old EOF and the new EOF has been posted. However, for dirty pages that are less than the file label EOF, these are posted with a "post whenever" option, that schedules these dirty pages with a low I/O priority. Eventually they'll get written either by low priority I/Os, or by high priority memory pressure. However, there's no specific timeframe at which we can guarantee that the pages are posted to disk.

"There is one way to guarantee this posting to disk in an application. If an FCONTROL(2) or FCONTROL(6) (flush data or write EOF, respectively) are done, at the return from the FCONTROL call you are guaranteed that the file data has been written to disk. Of course, if you have multiple accessors, another process could be dirtying the file behind you."

A programmer asked me if doing an ABORTJOB on a program would cause a broken chain in an IMAGE database.

That's right, blame the question on a programmer. It is amazing how many people think this is not only possible but also probable. I was just putting fingers to keyboard when Gavin Scott came through with a much better explanation than I could have constructed. Gavin graciously consented to my quoting him at length:

"It's amazing how much superstition exists surrounding this kind of stuff, and how many unnecessary rituals and sacrifices are performed daily to appease the mythical pantheon of data integrity gods.

"'Real' broken chains are (supposed to be) impossible to achieve with Image on MPE/iX, no matter what application programs do, or how they are aborted, or even how many times the system crashes!

"The Transaction Manager (XM) provides absolute protection against internal database inconsistencies, as long as there are no bugs in the system and as long as the hardware is not corrupting data. No action or configuration is required on the part of the user.

"Logical inconsistencies (order detail without an associated order header record for example) can easily be created by aborting an application that's in the middle of performing a database update that spans multiple records. Of course Image doesn't care whether your data is logically correct or not, that's the job of application programmers.

"Using DBBEGIN/DBEND will have no affect on logical integrity whatsoever, unless you actually run DBRECOV to roll forward or roll back the database to a consistent point every time you abort a program or suffer any other failure.

"By using the DBXBEGIN/DBXEND 'XM style' transactions, you can extend Image's guarantee of physical integrity to the logical integrity of your database. The system will ensure that no matter what happens, either all changes inside a DBX transaction will be applied, or none of them will be. Of course it's still possible to use this feature incorrectly (locking strategies are non-trivial as you need to lock the data that you read as well as that which you intend to write in many cases).

"MPE/V introduced a feature called Intrinsic Level Recovery (ILR) which could (and still can be) enabled for a database. This was sort of a mini-XM forced updates to disk each time an intrinsic call completed in order to ensure structural integrity of the database in the face of system failures.

"I believe that on MPE/iX, enabling ILR for a database does something really nasty like forcing an XM post after every update intrinsic call, which is a serious performance problem. ILR is no longer required on MPE/iX as XM will ensure integrity without it. With ILR you might be guaranteed that every committed transaction would survive a system abort, whereas without it XM might end up having to rollback the last fraction of a second's worth of transactions. For almost any application this difference is negligible. Do not turn ILR on!

"There are obviously more complexities if your application performs transactions that affect multiple databases or databases and non-database files."

In my personal experience Gavin’s basic premise that "'Real' broken chains are … impossible to achieve with Image on MPE/iX, no matter what application programs do, or how they are aborted, or even how many times the system crashes!" has held true. This is a graybeard talking here so pay attention to your elders. I have never encountered a broken chain in an Image database on MPE/iX after a program abort or system abort. Never. Image and MPE/iX are like the Timex watch, they take a licking but keep on ticking.

And, finally, how many PINs are enough?

Stan Sieler pointed this out to me. If you are currently on MPE/iX 6.0 or MPE/iX 6.5 then the maximum number of processes you can have is 8190. However, as of MPE/iX 7.0, the maximum number of processes is determined by the amount of memory on the system. From the 7.0 cummunicator:

Maximum PINs

Physical Memory 6.5 7.0
-------------------- ---- ----

m <= 256 MB          8190 1000
256 MB < m <= 512 MB 8190 2000
512 MB < m <= 1 GB   8190 4000
1 GB < m <= 2 GB     8190 8190
m > 2 GB             8190 8190

As Stan says, "Note the first three lines. If you bought a 969 or 968 with 256 MB, running 6.5, you could have 8190 processes. Suddenly, surprisingly, on 7.0 HP says, ‘no, we won't let you use that computer that way on 7.0 or later’." Now, before the conspiracy theorists come out of the woodwork, CSY did this for performance reasons and there is nothing magical or evil about the numbers chosen. However, it is something you need to be aware of and prepare for.

On my 959/400 production system, we commonly run around 5000 processes during peak load times. This system has 2GB of memory, so I do not have to worry. We were running only slightly lower loads when we had just 1 GB, so there may be sites that will be adversely impacted by this change. If you want to check the number of processes at any given time, the following debug statement (courtesy of Stan Sieler) will do the trick:

:debug

nmdebug > wl "Highest in use: ", [$c0000c94]:"#", ", max = ", [$c0000c98]:"#"

Let me know (john@burke-consulting.com) if you think you will exceed this limit and I will pass along the information to CSY.