Mystery laptop: the conclusion

I never really expected that my post about my laptop outperforming an 8-headed dragon of a server would draw that much attention: record number of first-day reads of any blog post on my blog ever (134), and record number of comments (11 comments, by 5 people, me included). Not a big deal, but I believe you deserve a conclusion, to know how it all ended.

To be honest, it didn’t.

The customer installed the SP2, we rebuilt the indexes and updated statistics, made sure that client configuration is the same as the client on my laptop (object cache, etc). The server did show some improvement: now it completes the job in just over 10 seconds (10,348 being the fastest, to be exact).

There are no obvious bottlenecks. Processor time is below 15%. There is no memory paging. The average disk queue length is below 0.01 at all times. With my laptop, it’s a different story – processor goes to 84%, with heavy paging (70+ pages/sec) and disk queue length jumping at over 4. Still, it completes the job in 6 seconds.

I honestly can’t think of anything that can draw such a poor performance from such a beast. It’s a general-purpose laptop, for crying out loud, with at least three bottlenecks, outperforming a dedicated, optimized, and probably oversized server. Huh?

Advertisements

16 Responses to “Mystery laptop: the conclusion”

  1. BostjanL Says:

    There is one more aspect to consider: NAV client. When a big amount of data is processed and there are many lines of C/SIDE code running, NAV client is using only one CPU core. So speed of single CPU core can be another bottleneck. This is case in posting and some periodical activities like cost adjusting. And possibly in production order refreshing.
    And usually the speed of single server CPU core is slower from laptops.
    So before SP2 was put on server, the SQL was bottleneck, but after that getting data is no issue any more. And since there is no huge amount of data to process even a laptop can handle it.

  2. Vjeko Says:

    BostjanL: Yes, this makes sense, NAV client is a single-threaded application.
    But, I still wonder… On my laptop, the disk is an obvious bottleneck, there is heavy paging going on, with significant queueing – with a slow disk this should really adversely affect performance. On server, everything is obviously performed in memory, no paging occurs, so this should be really fast. I suspect memory to be an issue on the server – either it is slow, or not configured optimally. Since I am no expert on hardware, I can’t really tell…

  3. BostjanL Says:

    No mater how fast data fetching is, there is still more or less processing to be done by NAV client. And when looking CPU performance on server, maybe you should set affinity od NAV client (finsql.exe) to only one core (not the firsts one). By doing so, you can more accurately see the CPU consumption of NAV client.
    The laptops disk is bottleneck, but the amount of data is maybe to low, so this bottleneck is not a major performance killer.
    And from my expirations main difference between workstation and server performance is in scalability. Server can handle (much) more concurrent sessions/jobs as workstation. The speed of only single session/job is not as different as one would expect., comparing he price tag of server and workstations.

  4. Davor Says:

    My money would be on BostjanL’s explanation as well. In the end you are comparing two different machines which have two different purposes. Although your notebook might be faster processing single user’s requests, the server really shines when its hit by dozens of users at the same time and it still keeps on crunching… That more than justifies its price tag 😉

  5. Vjeko Says:

    BostjanL, Davor: You are both right – server is about scalability. It’s just that I expected performance too.
    At very least, I would expect comparable speeds. If it were 10% or 20% speed difference, I’d say nothing. But 77% seems awfully lot, IMHO.
    But still, as I don’t know the exact model and architecture of the server’s processors (other than there are 8 cores of a “Xeon” family as reported by System Properties), I can’t really compare them.
    Thank you both a lot for commenting, and helping me settle with what I have 🙂

  6. mrak Says:

    I belive that your problem lies within disk subsystem on the server. SQL does not really take advantage of raid configuration and you mentioned that you have logs moved to another logical partition
    since you need both partitions and they are probably far away on the disk, you have created ideal condition to spend a lot of time on “seek time”

    i guess that if you move all of the database data to same partition you will get same or better results

  7. BostjanL Says:

    SQL does take advantage of raid. As faster gets raid/disk subsystem, faster are disk I/O. And there are some ground rules about how to setup disk subsystem for NAV database.
    Data files should (must) be on RAID10. Raid 10 because in NAV is quite hard to move tables to different data files. So all the data is usually in one datafile. RAID 5 is good only for low write and high read databases. And NAV (as all the other ERP’s) is heavy writeer.
    For log file separate RAID1 is recommended. If performance is in questions (heavy writing into database) then RAID10 can be helpful.
    System must be of course on separate array.
    In some cases even system databases and temp database must be moved to separate arrays.
    And all this arrays must be physical – they MUST be on separate physical disks.

  8. BostjanL Says:

    I have read the original post again. OMG. The data and log files are really on same disks. And only 3 disk for data and log files.
    This could be the problem.
    This server is potential problem. Because number and configuration of disk is not good. It will become more of the problem whit more concurrent users.
    But I have seen this kind of configs in past. There is no problem to sell CPU or memory, but when it come to disks, there is always a problem. The problem. You do not have to “count” GB of disk space. You have to count how many disks are spinning.
    And MS has some HW config guide for that.

  9. Vjeko Says:

    BostjanL: I never said that everything resides on same disks. The layout is 3 separate RAID 1+0 arrays (ok, I didn’t say “array” in my original post – hence the misunderstanding). So, it’s 12 physical disks, arranged in 3 RAID 1+0 arrays, with data files residing on one array, and log file residing on another. Many physical disks are spinning.
    Disk performance issues would also have come out in Performance Monitor – but as I said, performance monitor shows no disk queue, whatsoever.
    We do have Hardware Guide for deployments of Microsoft Dynamics NAV, but it’s partner and customer in the end who decides whether to follow the recommendations, or not.
    However, in this specific case, the hardware sizing is ok, server is maybe even a little bit oversized. Disks have never been an issue with this server.
    Also, if this was a disk issue, then it should first come out on the laptop, it’s a single physical volume (a 5400 RPM one!!), heavily fragmented, running both the data and log file off itself (well, kinda obvious). And indeed – it sports a HEAVY disk queue. Yet, it completes faster. No way this is a disk issue. Both machines are running EXACTLY the same process over EXACTLY the same data. Not a single bit is different.

  10. Vjeko Says:

    mrak: Sorry, I said “logical volume”, it’s really system, data, and log on separate physical RAID arrays. My error 😦 Anyway, the files layout is optimal, and per recommendation.

  11. Vjeko Says:

    Just as an explanation, I’ve rechecked something myself 🙂 My referring to RAID arrays as “logical volumes” is not wrong. You may have several logical volumes on one physical disk, or one logical volume spanning several physical disks. RAID arrays are logical volumes, comprising several physical disks.

  12. BostjanL Says:

    🙂 So config is OK. Glad to hear it. I have understand that there are only 3 disks.
    It is not or at least it shouldn’t be a data reading/writing issue.
    So let’s get back to speed of single core issue 😉

    It would be nice to have at least GHz of both mashine. Benchmark would be perfect.

  13. Vjeko Says:

    BostjanL: It’s kinda hard to put three disks in a RAID 1+0, isn’t it 😉
    Anyway, as you first pointed out, this is probably about processor cores, or maybe as I suspect about memory speed (or maybe latency), although processor is far more likely to bear the sole responsibility. Without a clear benchmark, it’s hard to tell.

    mrak: I thought some more about what you said, but couldn’t really figure out why you said that SQL can’t leverage the RAID. You had a SQL server with NAV, was this your experience? It must have been software RAID then, other than providing fail-over functionality through mirroring and higher contingent capacity through striping, it can’t really give you any performance advantages whatsoever if you run it over a single controller – if you had it over two controllers, then it gives you some advantage, but it doesn’t even compare with performance benefits you get from hardware RAIDs. If you had software RAID, then your SQL Server really couldn’t make any use of it indeed.

  14. mrak Says:

    as far as I understand, or at least for some older versions, SQL was supposed to work with raid1+0 and not with raid 3/5 (why, I really do not know)

    the thing is following; if you have scsi drives then you give one disk a task, move to another give task to that one and move to another… in EIDE, IDE, SATA and other configurations, you really need to have one disk to finish its operation to move to another

    so, while you might have good performance data on your drives, it still does not mean that they perform correctly; in raid 1+0, specially ATA environment, you need twice as much time to write something comparing to read time

    I strongly belive that you have problem with your drives, simple test would be to add another disk, without any raid and then move all database files to same disk (and therefore simulate your notebook config) and check for yourself

    also, there are a number of moves you can do to improve performance and one of them is to move some of the SQL data permanently to memory (there is some switch to do that, I am not really familiar with that but I know that you need to have UPS in order to use that)

    It was really quote some time since I fiddled with server hardware and I am more then rusty on that, but my “feeling” is that you have problem with your disk subsystem

  15. Vjeko Says:

    mrak: In this case, it is RAID 1+0, over SCSI disks, with a hardware RAID controller with native support for RAID 1+0. Hardware Guide says that this is the optimal disk configuration for Microsoft Dynamics NAV installations. It gives both read and write performance benefits, as well as redundancy. IMHO, disks are not a bit of a problem in this case, but we will be conducting more tests to definitely conclude where is the issue.

    everyone: Thanks for so much involevment with helping me sort out this issue! You rule! And rock, simultaneously.

  16. mrak Says:

Comments are closed.


%d bloggers like this: