Tuesday, June 1, 2010

Mainframe Performance Topics

+Tags Get help with tags?


Χ











  • View as cloud | list
Follow this blog
Join now
Create a mydW profile
Learn, share and network! Create a profile to grow your technical skills and connections

+ Similar Blogs

photo

True Confessi...

100 Entries | FrancesPast
Updated May 18
RatingsRatings 122 CommentsComments 132
photo

Smarter Colla...

263 Entries | AntonySatyadas
Updated Apr 12
RatingsRatings 27 CommentsComments 170
photo

"Turbo" Todd ...

1039 Entries | turbotodd
Updated May 28
RatingsRatings 11 CommentsComments 406
photo

Inside System...

549 Entries | az990tony
Updated Yesterday 3:31 AM
RatingsRatings 18 CommentsComments 602
photo

Scott Laningh...

394 Entries | slaningham
Updated May 26
RatingsRatings 113 CommentsComments 1,514

+ Blog Authors

PreviousPrevious
photoMartinPacker
NextNext
1 - 1 of 1 authors
1 - 15 of 208

WLM Velocity - "Rhetorical Devices Are Us"

MartinPacker | Jan 24 | Tags: pi wlm velocity | Comments (0) | Visits (567)
I'm beginning to look at performance data slightly differently these days...

As well as plotting things by Time Of Day (which our tools have done for 25 years) I'm beginning to plot things more directly with load. (Time Of Day is sort of code for With Load but not really - telling a story people can relate to more directly.)

The first instance of this "with load" approach was plotting CPU per Coupling Facility request (and also request Response Time) against request rate. That's proved invaluable (as you will see in previous blog entries).

The second instance is what I want to talk about now...

I plotted - for a single Service Class Period - velocity against CPU consumed. As it happens I had 200 data points, ranging from almost no CPU to 7 engines or so. CPU consumed is on the x axis and velocity is on the y axis. One further wrinkle: Each day was plotted as a differently coloured set of points (with different marker styles as well), enabling me to compare one day against another.

I'm not going to share the graph with you - as it really would be abusing customer confidence. But suffice it to say it was interesting...

As you go from no workload all the way up to 2n engines the following happens: The velocity starts out low and rapidly rises to well above the goal velocity, staying there until n engines' worth of CPU. Then it steadily but slowly declines to well below the goal velocity. At the highest levels of utilisation the velocity achieved is about 20% of the goal. These highest levels of utilisation, though, appear to be "on the curve" - albeit outliers in x value terms. I think that's an interesting dynamic that says at some point the achievement drops off and can't be sustained at or near the goal level.

The second thing that I noticed was that the points get more widely distributed as utilisation increases - most notably around the point where the velocity starts to drop. It's a most beautiful broadening out. So we get into a position of unstable velocity. Again not a good thing.

Finally, let's consider the days themselves. It turns out they're all pretty much alike, with two exceptions: All the "2n engine" outliers are from one day - a problem day. Also, on the part of the curve where the velocity is dropping away the "problem day" data points are spread both above and below the others. Again we're getting instability of outcome.

I really wish I could share this prototype chart with you - it's got truly beautiful structure. I'm going to "hand create" such a chart a few more times with different customers' data and then "shrink wrap" it into my analysis code. If you get to see it I think you'll like it. It could rapidly grow to be my new favourite rhetorical device. :-)

Of course the above only works for Velocity-based Service Class Periods but I'm sure I could dream up obvious analogue for the other goal types. (PI might be the unifying concept but it doesn't, in my view, pass the "keep it real" test, not that Velocity is that connected to delivered reality anyway.)

And I share it with you in case it's something you'd like to experiment with.
RatingsRatings 1

Going Global

MartinPacker | Jan 22 | Comments (2) | Visits (419)
In the interstices between finishing off a "rush job" piece of analysis for ONE customer and a conference call with a vendor on behalf of ANOTHER it's time to catch up with a piece of news...

As some of you will know I have a new job in IBM...

I've joined Software Group's Worldwide Banking Center of Excellence (WWCoE for short) as their z/OS System Performance person.

So, it's a pretty similar job but with a new focus: The World. :-) And, more specifically, mainly banks. Mainly, but not ENTIRELY, banks. So, I know some of my readership is from customers I've worked with in the past who don't happen to work in banking. I don't regard this as "so long and thanks for all the fish" as far as they are concerned.

And I don't reckon to be doing fewer conferences but (hopefully) more. And already the season is shaping up that way.

For me, I like the travel and I like the chance to work with customers I've not reached yet, some of whom have REALLY thorny problems I can help with.

As my Dad said recently on the phone "it's the job you've always wanted". And now I've got it, watch out World. :-)
No RatingsRatings 0

If The Cap Doesn't Fit...

MartinPacker | Jan 16 | Tags: memory rmf wlm z/os groups capping resource smf paging | Comments (0) | Visits (2,215)
... swear at it. :-)

No, I KNOW that's not right - but it's (for me) an irresistibly bad pun. And it's a natural reaction, too. :-)

In a recent customer situation I looked at the RMF Workload Activity Report data for a number of service classes. One WLM Sample count was particularly high: "Capped". In fact I look at, with tooling, SMF and the actual field is R723CCCA. (An IBM Development lab HAD looked at the data through the RMF Postprocessor "prism" and come to the same conclusion.)

It turns out, however, that the service classes in question aren't part of any WLM Resource Groups. (There IS a service class that is subject to Resource Group capping but it's not involved here.)

So, how can this be?

A piece of background will help:

The reason I had been asked to look at the SMF data was because a large dump episode had taken rather longer than it should have. It's the usual lesson of "don't dump into already busy page packs". The best way to ensure this doesn't happen is, of course, to dump into memory. (Which might not be affordable, but it IS the best way.)

What had in fact happened was that the system had become under extreme Auxiliary Storage stress. And this had been my suspicion all along.

I'm indebted to Robert Vaupel of WLM Development for confirming this:

Capping delays occur when an address space in the service class is marked non-dispatchable. This can occur when Resource Group capping takes place (switching between non-dispatchable and dispatchable in defined intervals) or when a paging or auxiliary storage shortage occurs and the address space is detected as being the reason for it.

In the above the address spaces are related to dumping, of course.

And the reason I asked Robert was because R723CCCA is populated by a WLM-maintained field (RCAECCAP from IWMRCOLL) - it always paying to understand the source of RMF numbers.

So, if you see values in R723CCCA when Resource Group capping is not in play this might be the cause. I've not seen this documented anywhere.

(One thing I'd NOT been crisp about - but Robert firmed up in my mind - is that "Capped" samples have NOTHING to do with Softcapping or LPAR Capping in general. That's a whole 'nother story.)

So, there may be a moral tale here: If you THINK the cap doesn't fit - it might well be the case it doesn't. :-)
No RatingsRatings 0

What I Did On My Vacation

MartinPacker | Jan 4 | Tags: koffice zip css openoffice linux nodelist jquery php javascript html dom odp dojo | Comments (0) | Visits (914)

First of all, a happy and prosperous 2010 to one and all.

As with most vacations it's been a time partially filled with playing with technology and learning stuff there isn't (legitimate) time to learn about during the rest of the year.

So, lest the rest of this post make you think I ONLY play with web stuff :-) I present to you a short list of REALLY good other things from the past few weeks:

  • Avatar in 3D (as the great Doctor Brian May recommended).
  • Uncharted 2 (on the PS3).
  • Beatles Rock Band (also on the PS3).
  • Neil Gaiman's "American Gods".
  • The company of friends and family.

Now onto the "geek stuff": :-)

My Performance Management tooling (standing on the shoulders of giants, as it happens) produces reports and charts as Bookmaster and GIFs, respectively. (Actually the GIF bit I built mid-year 2009.)

Some time in late 2009 I installed Apache on my Thinkpad - with PHP support. That enabled me to treat my laptop as an automation platform. I also installed Dojo and B2H. (B2H is a NICE but old piece of REXX that takes Bookmaster output and converts it into HTML.)

So this PHP code allows me to download all the GIFs and Bookmaster source and display it on my laptop.

In November I wrote some PHP code to selectively bundle the GIFs into a zip file - to make it easier to share them with colleagues and customers. (If YOU get one from me I hope you can readily unpack and view its contents.)

In mid-December I took this zip code and modified it to create OpenOffice ODP files from selected GIFs. Although legitimate ODP files OpenOffice couldn't read them - but KOffice on Linux COULD. And when written out again by KOffice OpenOffice was then able to read them. (I've not got to the bottom of this but it's something to do with some assumptions OpenOffice makes about XML.)

Vacation Learning and Developing

I think it's fair to say I've been using "interstitial" time to play with stuff and get things built.

Learning How To Hack The DOM with jQuery and Dojo

(For those that don't know jQuery and Dojo are javascript frameworks - free and Open Source.)

The first thing I did was to install jQuery and buy the excellent O'Reilly "jQuery Cookbook". This introduced me to a better way of parsing HTML / XML. It uses CSS selectors as a query mechanism - which is REALLY nice.

The second thing I did was to see if Dojo could do something similar. It turns out that dojo.query is pretty similar and converging on jQuery's capabilities. (1.4 adds some more.) If you're wedded to Dojo (as I am) I recommend you look at dojo.query and (related) NodeList support. It'll make "hacking the DOM" much easier. (And later developments built on this.)

(If you're looking for a good introduction to Dojo try Matthew Russell's "Dojo: The Definitive Guide", also published by O'Reilly. It could do with updating for the next release but it's perfectly fine for 1.4.)

Using PHP To Simplify Dojo Development

I now have a small set of PHP functions I've built up over the months that make it very easy for me to create a web page that takes advantage of Dojo. So, for instance, it's very easy to write the stuff in the "head" and "body" tags to make Dojo create widgets (Dijits) and pull in the necessary CSS and javascript.

One problem I wanted to solve was to prettify the HTML that B2H generates. It's at the 3.2 level and is really not at all "structural" so CSS styling would prove to be a bear. (It has no class or id attributes, for example.)

Dojo can automate (with xhrGet) the asynchronous loading of files from the server. So the first thing I taught my PHP code how to do was to load some HTML and then to insert it (via innerHTML) below a specified element in the web page. (At first I used "id" as the anchor but then used dojo.query (see above) to allow the HTML to be injected ANYWHERE in the page.)

(Because not all the data I want to display in a page is HTML I added a "preprocess the loaded file" capability. So, for example I can now take a newline-separated list of names and wrap each name in an "option" tag.)

So, I can now pull in HTML from a side file. The point is to be able to work on it...

Injecting a CSS link was easy. It's just a static "link" tag.

But some parts of the dragged-in HTML aren't really distinquishable from other parts. So I can't style them differently. So I wrote some more code to be able to post-process the injected HTML (once it's part of the page). So, for example, a table description acquired a "tdesc" class name - and so CSS selectors can work with that. To do the post-processing I leaned heavily on Dojo's NodeList capability - as it made the coding MUCH easier.

So now, if I show you an HTML report based on your data it should look MUCH prettier. (I've been showing customers their machines and LPARs as pretty ugly HTML.)

Dojo TabContainer Enhancements in 1.4

Some time over the vacation I installed Dojo 1.4 and converted from using 1.3.2.

I hadn't expected this but the dijit.TabContainer widget that I was already using to display GIFs got enhanced in 1.4...

  • Instead of multiple rows of tabs you (by default) now have one - with a drop-down list to display all the tab titles. (Amongst other things this means a PREDICATABLE amount of screen real-estate taken up by the tabs.)
  • Scroll forwards and backwards buttons to allow you to page amongst the tabs. (Actually left and right arrow keys allow scrolling as well.)

Altogether it's a much slicker design. I've opened a couple of enhancement tickets:

These really are "fit and finish" items but they would help with a11y (Accessibility) as well. (I've made contact (via Twitter) with IBM's Dojo a11y advocate and she's aware of these two tickets.)

Conclusion

This has been a long and winding blog post. But I think it illustrates one thing: Through small incremental enhancements (done in "interstitial time") you can make quite large improvements in code. But then, this IS hobbyist code.

I'd also like to think I learnt a lot along the way.

Now to go explain to my manager why I'd like (as a mainframe performance guy) to become a contributor to the Dojo code base. :-)

No RatingsRatings 0

zAAP CPU Time Bug in Type 72 Record - OA29974 Is The Cure

MartinPacker | Dec 8 2009 | Tags: rmf oa29974 zaap cpu | Comments (1) | Visits (444)

If you use RMF Postprocessor you won't see this one. If you use Service Units rather than CPU seconds fields in the SMF 72-3 record you also won't see it. It's only if (like me) you use the CPU time for zAAPs in your CPU Utilisation calculation that you'll run into this problem.

If you examine fields R723IFAT (zAAP CPU Time) and R723IFCT (zAAP on GCP CPU Time) you might find them zero when you don't expect them to be i.e. when the Service Units analogues (R723CIFA and R723CIFC) are non-zero. IFAT and IFCT are indeed out of line and CIFA and CIFC (and the Workload Activity Report) are correct.

A (sensible) suggested workaround is to use CIFA and CIFC, converting from service units to CPU time.

The answer is to apply the fix for APAR OA29974. I think I'd apply it anyway. It seems like a fairly harmless PTF.

I ran into this because a colleague showed me a Workload Activity postprocessor report with substantial zAAP on GCP time in it, whereas MY code showed zero. (My code was correct but misdirected by the data.) :-)

Because I can't control the SMF that customers send me I think I'm going to have to code around this one - if I start seeing this regularly.

No RatingsRatings 0

Plan Your ESQA Carefully For z/OS Release 11

MartinPacker | Dec 1 2009 | Tags: 78-2 z/os virtual_storage smf ecsa sqa rmf esqa csa | Comments (0) | Visits (490)

Thanks to Marna Walle for pointing out this change:

In z/OS Release 11 there is a requirement for an additional 1608 bytes of ESQA per address space. To put that in context, I'll do some obvious maths: That's about 1.6MB per 1000 address spaces. It just might be of interest to certain customers I know with thousands of CICS regions in a system, or very large TSO or Batch systems. It's probably not enough to trouble most people. But it reminds me of the importance of having a quick virtual storage check when migrating from one major product release to another.

There are several ways of checking for this particular one:

  • You can use Healthchecker VSM_SQA_THRESHOLD check.
  • You can process the SMF 78-2 Virtual Storage record.

The latter would be my favourite as using the SMF 78-2 data to look at usage by time of day can show some useful patterns. You might want to review, for example, whether (E)SQA threatens to overlow into (E)CSA. It's not a big tragedy if that happens but your installation might have views on such things.

(In case you're unfamiliar with such things the "E" in "(E)SQA" and "(E)CSA" refers to 31-bit areas whereas the names without the "E" refer to 24-bit areas, there being analogues above and below the line for both SQA and CSA.)

One other thing - in case you think ESQA and ECSA are unimportant having very large such areas can impact on the 31-bit Private Area virtual storage picture.

No RatingsRatings 0

DFSORT Does JOIN

MartinPacker | Nov 27 2009 | Tags: uk51706 dfsort join db2 icetool joinkeys uk51707 | Comments (0) | Visits (1,019)

A new set of function was recently made available for DFSORT via PTFs UK51706 and UK51707.

In this post I want to talk about the new JOINKEYS function, and try to add a little value by discussing some performance considerations. I've had the code for a couple of months and have played with it but not extensively. So much of what follows is based on thinking about the function (described in this document) and bringing some of my DB2 experience to bear.

With this enhancement DFSORT allows you to do all the kinds of two-way joins DB2 folks would expect to be able to do - in a single simple operation. "Two way" refers to joining two files together. You can perform e.g. a three-way join by joining two files together and then joining the resulting file with a third. With "raw" DFSORT that would be two job steps. With ICETOOL you can make this a single job step. In any case I think I'd recommend using ICETOOL because converting to ICETOOL later when you find you want to add a third file to the join would be additional work.

How JOINKEYS Works

Before talking about performance let me describe how JOINKEYS works. In JOINKEYS parlance we talk about files "F1" and "F2". Indeed the syntax uses those terms...

  • The join itself is performed by the main DFSORT program task. I receives its data through a special E15 exit and processes it like any other DFSORT invocation, with the exception that it knows it's doing a join. So things like E35 exits and OUTFIL all work as normal.
  • Both F1 and F2 files are read by separate tasks. Each of these writes their data using an E35 exit. Normal processing capabilities such as E15 exits (potentially different for F1 and for F2) and INCLUDE / OMIT and INREC processing apply.
  • The F1 and F2 tasks and the main tasks communicate by "pipes" constructed between the F1 and F2 E35 exits and the main task E15 exit. These pipes have no depth and don't occupy significant working memory or any intermediate disk space.

A Potential For Parallelism?

So we have three DFSORT tasks operating in parallel, feeding data through pipes. In principle they could run on separate processors. The extent to which that's useful would, I think, depend on whether these tasks are performing sorts or just reformatting copies. I say this because in the copy case I'd expect the F1 and F2 tasks to be interlocked with the main task whereas in the sort case there's stuff to do before we get to writing through the pipes. And in the latter case we're probably only effectively driving two separate processors. But this is a fine point.

In any case we derive I/O Parallelism because the F1 and F2 tasks run in parallel. Again its usefulness depends on timing.

Managing The Sorts

You can specify whether the F1 and F2 tasks perform a sorts. So you could declare that F1 was already sorted, whereas F2 wasn't.

You can decide whether DFSORT will terminate if the F1 or F2 files are not in order. (This only applies and makes sense if you've claimed the data was already sorted.)

You can specify whether the main task sorts the results of the joined F1 and F2 files.

More on why sort avoidance might be important in a minute.

Join Order

As I mentioned earlier, you can use repeated invocations of JOINKEYS (most readily using ICETOOL) to join more than two files together.

Now this is where some DB2 SQL tuning background comes in handy...

You have a choice which order to join the files in. As this isn't DB2 you don't have the Optimizer making such decisions for you. So you have to decide for yourself. But think about it: If you joined a large file to a small file in Step 1 and then joined the large resulting intermediate file to another small file in Step 2 you've chucked a lot of data around - twice. If you could arrange to join the large file in Step 2 to the results of joining the small files in Step 1 there would be less data chucking around. It ought to run faster.

Cutting Down The Data

As with all DFSORT invocations, cutting down the data early is important: Joining two large files together, only to throw away large amounts of the result is inefficient: If you can throw away unwanted records on the way in, or can throw away unwanted fields, the join will be more efficient. In the F1 and F2 tasks you can.

In the F1 and F2 tasks you can supply file size estimates - as they each have their own control files - by default "JNF1CNTL" and "JNF2CNTL". You could do this for the main sort, too. In the F1 and F2 case this is more important when you cut down the files on the way in.

Avoiding Unnecessary Sorts

If you know the files you are joining are already sorted in an appropriate order for the join you can avoid sorts on the way into the join. And this will obviously be more efficient. If you can live with the order DFSORT writes the records from JOINKEYS you can use COPY rather than SORT in the main task.

Memory Usage

In the worst case - where F1 and F2 files are sorted in parallel and where the main task also sorts data - you have the potential for large amounts of memory being necessary. You need to cater for that.

In Summary

I really like this function. It removes the need for much fiddliness - and it does it in a simple way. (I'm conscious I've shown no examples but the documentation linked to above is replete with them.)

My perspective is as a performance guy who has some knowledge of how DB2 does joins. This isn't the same code so the lessons from the DB2 Optimizer have to be applied sparingly. And note we don't even have indexes on sequential files (though you could simulate an "index scan" by retrieve only the join keys.)

I'd like to do some performance runs that illustrate the points above. I'm a little tight on time right now - so that'll have to wait. And I'm sure there's more thinking that could be done on how to tune JOINKEYS invocations.

No RatingsRatings 0

Channel Performance Reporting

MartinPacker | Nov 22 2009 | Tags: smf73 z/os.channels rmf smf78-3 | Comments (0) | Visits (806)

Our channel reporting has consisted forever of a single chart. Before I tell you what the chart looked like I'll hazard that your channel reporting was about as bad. :-)

See, it's not something people tend to put much effort into.

Our one-chart report basically listed the top channels, from the perspective of the z/OS system under study, ranked by total channel utilisation descending - as a bar chart. The raw data for this is SMF Type 73. Actually there were two refinements people had made over the decades:

  • Someone acknowledged the existence of the (then-called) EMIF capability to share channels between LPARs in the same machine. So stacked on top of this partition's busy they added other partitions' busy.
  • Someone supported FICON by using the new FICON instrumentation to derive channel utilisation. (Of course if the channel's not FICON we still use the old calculation: with some smart copying involved.)

And that's where we left it until got my hands on the code...

  • The first thing I did, some months ago, was to add the channel path acronym (for example "FC_S" for "FICON Switched"). This is also in SMF 73.
  • The second thing was much more significant:

    The "other partitions' busy" number is all other partitions' use of the channel, without breaking down which other partitions these are.

  • The third thing was a nice "fit and finish" item: Listing which controllers were attached to which channel.

Which LPARs Share This Channel

Each z/OS image can create its own SMF 73 records. For me I'm hostage to which systems my clients send in data for. Also I have to cut down the potential LPARs in the data. I do this using the following rules:

  • The channel number (in Type 73) has to match.
  • For multiple Logical Channel Subsystem (LCSS) machines (System z9 and System z10) the LCSS number must match. (This can be gleaned from Type 73. Actually Type 70 as well - as each LPAR has only one LCSS.)
  • The machine serial number has to match. (Machine serial number isn't in Type 73. You have to go to the Type 70 for it.)
  • (I do a "belt and braces" check that the Channel Path Acronym (in Type 73) matches.)

So that set of checks tells you which LPARs really share the channel. And so you can then stack up their utilisations to gain a better picture of the channel. It's quite nice when you do.

One other thing: Because I don't necessarily see all the LPARs sharing a channel I compute an "Other Busy" number and add that to the stacked bar. In fact my test data showed all the major channels were missing LPARs' contributions.

Which Controllers Are Accessed Using This Channel

To me a channel isn't really interesting until you know what's attached to it. (In my current set of data my test LPAR's data shows one group of four channels attached to five controllers and another group of eight attached to two controllers.)

Working out which controllers are attached is quite fiddly:

  1. Use SMF 78 Subtype 3 (I/O Queuing) records to list the Logical Control Units (LCUs) attached to this channel.
  2. Use some magic code we have to relate LCUs to Cache Controller IDs. Basically it does clever stuff with SMF 74-5 (Cache) and 74-1 (Device) records to tie the two together.

I made a design decision not to annotate the graph with LCU names as there are usually many in a Cache Controller. It would be very cluttered if I had. (I do have another report that lists them and the channels attached to them.) Instead I list the Cache Controller IDs. You can probably relate to Controller IDs. If we've done our homework (and as we use your cache controller serial numbers we generally have) you'll recognise the IDs.

So, if you're one of my customers and I throw up a chart that shows channels and systems sharing them and the controllers attached it may look serene and slick. But believe me, there's a lot of furious paddling that's gone on under the surface. :-)

But I tell you all this in case you're wondering about how to improve your channel reporting. And I still think there's more I can do in this area - particularly with the (more exotic) SMF 74-7 record, which brings FICON Director topology into play. And everything I've said above applies equally to whichever tools you use to crunch RMF SMF, I'm quite sure.

No RatingsRatings 0

A Few Thoughts On Parallel Sysplex Test Environments

MartinPacker | Nov 10 2009 | Tags: parallel zos sysplex | Comments (0) | Visits (512)
There's a pattern I've seen over a number of test Parallel Sysplex environments over the past few years, a couple of them in situations this year:

It's not much use drawing performance inferences from test environments if they're not set up properly for performance tests.

Sounds obvious, doesn't it?

There are two problem areas I want to draw your attention to:

  1. Shared Coupling Facility Images

    If you run a performance test in an environment with shared coupling facility images you stand to get horrendous request response times and the vast majority of requests going async (given a chance). I've even seen environments where XCF refuses to use coupling facility structures and routes ALL the traffic over CTCs. (And I've seen a couple of environments where there are no CTCs to route it over and XCF traffic is then reduced to a crawl.)
  2. "Short Engine" z/OS Coupled Images

    In a recent customer situation I saw the effect of this: The customer was testing DB2 Loads where actually it was a bunch of SQL inserts. They were also duplexing the LOCK1 structure for the data sharing group. The Coupling Facility setup was perfect, but still response times became really bad once duplexing was established for the LOCK1 structure. Two salient facts: Because of duplexing all the LOCK1 requests were async. XCF list structure request response times were always awful.

    The answer to why this problem occurred lies in understanding how async requests are handled: The coupled z/OS CPU doesn't spin in the async case. In the "low LPAR weight relative to logical engines online" case the z/OS LPAR's logical engines were but rarely dispatch on physical engines. This meant there was a substantial delay in z/OS detecting the completion of an async request. Hence the elongated async response times. As I said, the LOCK1 structure went async once it was duplexed.

    As it happens the physical machine wasn't all that busy: Allowing the LPAR to exceed share - using a soaker job - ensured logical engines remained dispatched on physical engines longer. And, perhaps paradoxically, the async request response times went right down. This, I hope, reassured the customer that in Production (with "longer-engine" coupled z/OS LPARs) async coupling facility response times ought to be OK.
Now, this is just Test. But it could unnecessarily freak people out. But, hopefully, it's easy to see why Test Parallel Sysplex environments might perform much worse tan Production ones.

(I'm guessing you're going "duh, I knew Test would be worse than Prod". :-) But these two cases are specifics of why Test might be even worse compared to Prod than expected.)

Anyhow, I thought they were interesting. And I have seen 1. quite a few times now. 2. not so much, in fact only once so far.
No RatingsRatings 0

DDF Performance - Version 3 - but still highly relevant

MartinPacker | Nov 9 2009 | Tags: ddf cotner curt wlm db2 | Comments (0) | Visits (751)

I've just submitted a set of slides to Slideshare. They're not mine, they're not new, they're not even in a modern format. But they are a very good presentation worth preserving...

In 1993 Curt Cotner presented a set of slides on the new DDF Inactive Thread support in Version 3 of DB2. It's still highly relevant and this support was the base on which the Version 4 WLM classification line item was built.

You can find the slides here.

I'd also recommend you went on to read John Arwe's paper on Preemptible-Class SRBs.

No RatingsRatings 0

Hello World Again!

MartinPacker | Nov 9 2009 | Comments (0) | Visits (584)
Apologies for being away. For a while I couldn't get the blog to operate - as an author. And then I gave up trying. :-(

Count this as a "testing 1-2-3" post but also as notice I intend to back with more (hopefully useful) content soon. MUCH has happened in the "intermission". :-)
No RatingsRatings 0

European System z Tech Conference - Brussels 4-8 May 2009

MartinPacker | May 7 2009 | Tags: twitter | Comments (0) | Visits (999)
I'm reporting what I'm learning (or I think is significant) in conference sessions on Twitter. My Id is "MartinPacker" and I'm using the Hashtag "#zOS09" to tag my posts. Feel free to follow along. In principle you don't even need to sign up to Twitter to do this.

It seems more immediate than posting here.

Oh, and feel free to comment on Twitter using the same tag.[Read More]

No RatingsRatings 0

Javascript on z/OS For Beginners - Getting It To Run

MartinPacker | Apr 27 2009 | Tags: javascript rhino java z/os unix | Comments (0) | Visits (1,292)

Here's another "For Beginners" post to encourage people to just leap in and try it...

Javascript is a language popularly used for such things as web pages with some programming in them (and that includes frameworks like dojo), building Firefox extensions, and the Adobe AIR (desktop) runtime. As it happens I'm pretty familiar with Javascript anyway - having done all 3 of the above. This post shows how I got it to run on z/OS using Mozilla's Rhino...

Mozilla's Rhino project is a javascript interpreter written in java. (Java 6 does in fact have a general-purpose scripting interface - with javascript as a prime target.) It's important to note that Rhino requires Java 5.

So here's what I did...

  1. Downloaded the Rhino package to my PC from here.
  2. Unpacked it on my PC using a Zip tool (in my case 7-Zip).
  3. Tested it out on my PC - just to get comfortable with how it worked (and that was well worth the half an hour it took).
  4. FTP'ed binary the included js.jar file to an HFS file (in my case /u//rhino/js.jar). The file was under 1MB in size.
  5. Adjusted my CLASSPATH to point additionally to js.jar (including js.jar explicitly, not just the directory it was in).
  6. Invoked Rhino in interactive mode:
    java org.mozilla.javascript.tools.shell.Main
  7. Typed in a few javascript statements such as
    print(1+2)
  8. Quit by typing "quit()"

You can pass the name of a javascript file to Rhino by adding eg "test.js" to the command to invoke it.

In my case I also used the ".profile" startup script to adjust my CLASSPATH and to alias "js" to mean "run the Rhino javascript interpreter".

So, it's actually VERY straightforward to run Javascript under z/OS - thanks to Rhino. (And I suspect anyone with Java 6 installed would find it even easier.)

As always, if you know better please feel free to comment here. Remember, I'm learning as I go, and I just want to encourage others to try a few things.[Read More]

No RatingsRatings 0

z/OS Unix System Services For Beginners - Java Hello World

MartinPacker | Apr 25 2009 | Tags: java unix z/os beginner | Comments (0) | Visits (1,092)

In this post (and any others in a similar vein) I'm going to be displaying a great deal of ignorance - but I think I'm doing it in a good cause.

I know a fair amount about things like Java, XML and C++ - but NOT on z/OS. So I'm determined to learn and present what I learn as "For Beginners" posts here.

The idea is that I'll encourage other people with more traditional z/OS skills to try some simple new things, such as java. So, if you've not done these things before do have a go. Stuff is remarkably straightforward to do.

Here's how I created a simple "Hello World" java application in Unix System Services.

  1. Log onto TSO with a region of 64MB (64000 in the logon panel). When I tried this with 32000 I got a "JVMDBG001" message, complaining of not being able to GETMAIN enough memory to start the JVM (but the JVM started anyway).
  2. Type OMVS. (If your userid has been set up to use Unix System Services it should start you in a directory (mine being "/u/userid").
  3. Create a subdirectory using "mkdir javatest".
  4. Change to this subdirectory using "cd javatest".
  5. List what's in this subdirectory (actually nothing) using "ls".
  6. Create a new java source file using "oedit Hello.java".
  7. While in the (ISPF) Editor add the following lines:
    public class Hello{  public static void main(String args[])  {    System.out.println("Welcome!");  }}
  8. Press PF3 to save. (Treat the resulting dialog as any "leaving ISPF" dialog.)
  9. Compile the resulting java with "javac Hello.java".
  10. Run the compiled java bytecode with "java Hello".

And that's all there is to it. One note - if you're not familiar with java: Case is significant. Mismatches would cause problems.

The java book I used to get started (and then some) was Deitel & Deitel's "Java How To Program" but there are lots of them to choose from.

If you're more experienced at this than me please feel free to comment as you see fit. But remember this is a "For Beginners" post. I'm trying to encourage people to GET STARTED.[Read More]

No RatingsRatings 0

DB2 Data Sharing and XCF Job Name

MartinPacker | Apr 25 2009 | Tags: sysplex irlm xcf r742mjob db2 rmf | Comments (0) | Visits (1,016)

Back in z/OS R.9 RMF Parallel Sysplex New Fields (in 2007) I mentioned a new field: R742MJOB (XCF Member Job Name.)

At the time I had no real customer data so I could only espouse the HOPE that this field would be useful. (When I asked for it to be added to the SMF 74 Subtype 2 record it seemed to me it probably would be.)

Now that z/OS R.9 is "mainstream" I'm seeing lots of data at this level. And so, because it's mainstream, I think it's time to talk some more about this field (and to tell you what I'm seeing in customer data). I think you'll like it.

But first some preliminaries:

XCF Groups and Members

Before there was Parallel Sysplex there was Sysplex. And the main ingredient of (base) Sysplex was XCF signalling. Applications that use XCF consist of groups and members. A group is essentially an application. So the "SYSGRS" group is the GRS application. Members are address spaces participating in the application.

XCF's job is to pass messages within members of the same group, whether on one system or several within the Sysplex.

Instrumentation

Since the advent of XCF we've had group names, member names and messages sent and received in the 74-2 record. But there's a problem here, best illustrated by the example of DB2 Data Sharing:

A DB2 Data Sharing group comprises, amongst other Coupling Facility structures, a "LOCK1" structure. It's a Lock structure and it's called the "LOCK1" structure because its name is always "_LOCK1". We can easily identify this structure and the group it belongs to. Associated with the LOCK1 structure are 2 XCF groups:

  • IXCLOnnn - which is a particularly unhelpful name as ALL Lock structures have such an XCF group associated with them. (There usually being several other Lock structures in a single Parallel Sysplex - such as GRS Star and VSAM RLS's Lock structure.) The member name, by the way, would be something equally unhelpful like "M102".

    The purpose of this group is to resolve potential "False Contention" situations in the Lock1 structure (possibly caused by too small a lock table).

  • DXRabcd - where the "abcd" is the Data Sharing Group Name. This is slightly better but it doesn't help that the member name is e.g. "DXRDPG0$$IPB1003".

    This XCF group is used by the various IRLM address spaces in the group to resolve locking conflicts from DB2's perspective. (IXCLOnnn group traffic resolves to either a False Contention or an XES contention, the latter having to be resolved to either a lock being granted or IRLM negotiation causing the requester to wait.)

So, it would be really handy to isolate these two types of traffic for a particular DB2 Data Sharing group (and perhaps to work on reducing it). As you can see, it's rather hard to do that without more useful information. This is where 74-2 XCF Jobname comes in...

It turns out that for BOTH the IXCLOnnn group and the DXRabcd group the job name is the IRLM address space name. So, if you know the IRLM job name (hopefully because you've given it a name close to that of the Data Sharing group it supports) you can easily tell which XCF groups relate to that Data Sharing group. And then perhaps you can do something about the traffic.

Perhaps I'm displaying my ignorance of DB2 here but in the test case I'm looking at right now I see 2 IRLM address spaces in one system and one in the other, all sharing the same XCF groups - both IXCLOnnn and DXRabcd. I didn't know one could run 2 IRLMs in the same DB2 Data Sharing group in the same LPAR. I'll admit I'm not really sure what to make of that.

But what about other XCF groups? It turns out that they also have good mnemonic address space names. But then for most other groups it's obvious what the XCF group is anyway.

Glancing over a friend's shoulder at a modern RMF XCF Activity Report I didn't see the job name in the report (but rather the member name). But then I, and I expect most other people, don't often look at RMF Postprocessor reports all that often.[Read More]

No RatingsRatings 0

Jump to page of 14

No comments:

Post a Comment