What is really helpful is that you can run DynaTrace in production without impacting performance. DynaTrace agents run on the production web/app servers and send data to DynaTrace collectors.
This makes resolving production issues way easier. You sort by CPU time in descending order and work down the list. You already have the stack trace and data associated with the request or SQL query.
Sure but the issue with Dynatrace and similar ilk (NewRelic also) is they still attach a profiler and do have some impact.
Now, I wouldn't typically care, but you can get at lest 95% of the same information, including yes, stack traces, with ETW for basically free - in cost and in perf impact on Windows at least. Throw the results in a flame graph and boom, done. Windows in particular has so much great inbuilt stuff for troubleshooting perf but its hard to find and not well marketed, so these other folks make a killing selling an inferior product.
ETW is definitely great to get started and I am sure you find a lot of hotspots with it.
The benefit of APM tools such as Dynatrace (keep in mind that I am also working for them) is that they automatically tag & trace requests across thread and runtime boundaries. So - besides stack traces of individual CLRs you get a full end-to-end (browser to database) view including capturing method arguments and return values where it makes sense, e.g: SQL Query, Bind values, Web Service URLs + parameters, ... - as we hook into the profiling APIs we get some good information about GC and the impact GC has on your execution times, ...
Good news is that all APM vendors provide free trial versions. So - give it try if you want to see if it gives you more than ETW!
And if you want to give Dynatrace a try - here is the link: http://bit.ly/dtpersonal. I also have a free YouTube channel with tutorials that explain how the whole thing works: http://bit.ly/dtpersonal
I don't know - can you interop with Win32 from Java on Windows? If so, possibly. I doubt Java itself would have an ETW provider for various runtime events (GC's, contentions etc).
Well, usually via JNI/JNA. Not as pretty as P/Invoke. I guess the lack of PDBs might be an issue for resolving stack traces, even if you don't get GC events.
It's expensive. I think when my last company bought it, it was 15k per license or something astronomical like that.
They bought 20 or so licenses and never actually used them in prod - mostly because of the effort in building up a sane config for ops to use. The guy who set it up did a pretty good job, but after coming up to him and mentioning the 2hr/day stop the world GCing going on, dynatrace wasn't able to see them, where the jvm logs could.
So yeah - it's a good product, but you better hope you have the money for it and the patience to build a good config.
Sampling profilers in general are terrible. Unfortunately I can't get code instrumenting profilers to work on the eclipse process.
The worst thing about the JVM aprofiler won't tell you about how much time the JVM spends JITing/class loading. For short running processes you might be surprised to find that the profiler tells you everything is okay but in my case the program still needed one second to start.
I was pleasantly suprised that the equivalent python programm merely needs 100ms from start to finish (essentially feeling instant) even though it should be slower overall.
Instrumenting profilers are also terrible because they make tiny methods look way more expensive than they should be, warping the results. Not only does the instrumentation cost, but they can prevent methods from being inlined, interfere with optimizations, cache, and generally give misleading results.
For a sampling profiler to give meaningful results, it shouldn't be waiting for safepoints. Ideally it would capture stack traces very cheaply (just walk the stack frames, noting the return addresses) and turn those numbers into symbolic locations on its own time. Yes, inlining is going to confuse things - that's the downside of working with optimized code. Optimization inherently intermingles code that may be separated by some distance, and any one instruction may be a fuse of multiple source lines. If your instrumenting profiler prevents this, you're no longer measuring the actual production code.
Java makes it harder, of course, with code on a GC'd heap, dynamic codegen, etc.
jstat or Java Flight Recorder/Mission Control will give you plenty of info on class loading, JITing, GC, and almost everything else going on in the VM, and there are plugins fort VisualVM which will provide a lot of the same information.
Java mixed-mode flame graphs provide a complete visualization of CPU usage and [...] can identify all CPU consumers and issues, including those that are hidden from other profilers.
For me, this is a storm in a teacup. I use Netbeans Profiler regularly, and it produces decent results for me.
It's only when the profile starts getting pretty flat that the inaccuracies of this method start showing up, but for run of the mill performance problems where you have nice fat spikes sticking up here and there, it is perfectly adequate.
Throw .Net into this category. Spent weeks of my life and our clients' lives wasting time with, not inaccurate, but grossly incorrect results from a paid product. It either confused locks for CPU time (obscuring real hotspots) or simply reported CPU time where there was none.
So far, only VTune has resulted in actionable profiler results (there may be others); within hours (not weeks) we had solved many issues by switching to it. A profiler is as indispensable as a compiler, don't skimp on it - you're going to end up spending that money one way or another.
I've never ever gotten any good results with a Java profiler. HPROF runs, but its results are garbage. Yourkit and VisualVM are apparently not able to handle the heat of a CPU intensive application and crash or are generally unusable (and this on two different computers with different OSes). If you must know, I was trying to profile a parser.
I have had good results with Yourkit. And so far in my experience, that is the only profiler that has been able to survive profiling busy services running large heaps. ymmv.
I've only started profiling Java code this week, and I've been shocked by the poor state of affairs.
Does anyone have any recommendations, preferably that can tell you (some) information at the line level. I never realised how spoilt I was with C/C++ options (gcov and valgrind aren't great, but they mostly do the job).
If you want coverage tools then you should be looking for coverage tools, not profiling tools. Jacoco is a good library that you can plug into your build processes, or add as an agent on the command line, and it has an Eclipse plugin. It will tell you about partial and complete coverage of lines, and about the number of branches taken if coverage of a line is not complete.
Except you can have a sampling profiler that does better than what most sampling profilers on the JVM are doing?
IOW the point of the blog post is not that sampling profilers are bad or that sampling profilers on the JVM are bad, but that some sampling profilers are bad because they use bad methodology. At no additional cost you could be using profilers that use a more accurate mechanism.
To me that is very useful information. I'm always looking for opportunities to get something for nothing.
I would also not characterize users of sampling profilers as knowing their limitations. Taking a WAG I would guess less than 50% understand the limitations. A much larger percentages know limitations exist, but I don't think they actually know what to do about them.
What is really helpful is that you can run DynaTrace in production without impacting performance. DynaTrace agents run on the production web/app servers and send data to DynaTrace collectors.
This makes resolving production issues way easier. You sort by CPU time in descending order and work down the list. You already have the stack trace and data associated with the request or SQL query.