Huge performance difference in string.EndsWith between Linux and Windows for non Invariant, non en-US. #5612
|
cc: @ellismg Interesting. I don't have that exact set up handy, but on Ubuntu 14.04 with CLI 1.0.0-preview2-002853 and relatively recent rc3 builds, I see:
so much faster than what you're seeing, and actually faster than what I'm seeing on the Windows host, e.g.
But... my LANG is en-US. If I do:
and then run it again, I get numbers much closer to yours:
Not a real solution, but as an experiment, @jskeet, if you change your LANG to en-US, do you see a perf improvement? |
|
If the problem is the LANG setting, then what's happening is that our fast paths for ASCII StartsWith/EndsWith are specific today to en-US and Invariant (see the code here.) Unlike Windows, having to do a full linguistic StartsWith or EndsWith is slow because we have to construct some ICU searching objects which we can't cache across runs (since the object is specific to a target string you are searching for, and we don't maintain a cache of searcher objects). For 1.1 we should look at expanding the fast path to work for ASCII strings if the collation rules for the current locale don't tailor anything in the ASCII range, which would allow this fast path to be hit for locales like en-GB. We could also consider trying to re-implement IndexOf in terms of some lower level ICU primitives that we might be able to cache across calls. |
|
@stephentoub: Bingo! Yes, with $ LANG=en-US dotnet runI get:
Applying the same workaround to running my Noda Time tests halves the total test time, too... (That's where all this started.) |
|
@jskeet Another option (and I'm not sure if this is possible via the interfaces NUnit exposes, a quick glance over the source doesn't give me much hope) is to use Ordinal or OrdinalIgnoreCase, which will ignore all the ICU gunk. |
|
@ellismg: The checks here are deep in the bowels of NUnit - but could be fixed very easily with a patch. It's unfortunate that it's necessary, but it feels like a good practical solution to a very real issue. Will file a feature request now... |
|
@jskeet Thanks! I very much expect we will do the extension of the ASCII fast paths for 1.1, but we may be in a world where linguistic StartsWith and EndsWith are slower for non ASCII strings or strings where collation for ASCII characters differs (e.g tr-TR) because of how this stuff is implemented in terms of ICU, so if we can upstream general goodness changes to force ordinal comparisions when you don't need linguistic behavior that would be great. |
|
Yup. Would be good if every call to |
|
Given how disgusting and unexpected such ICU regressions are likely to be, could there be a syntax added to .net core that lets you specify ie StartsWith(arg, forceAscii:true) ? |
|
@wpostma: What would |
|
Um, forget I suggested that. :-) |
|
@ellismg Is this actionable for 1.1.0? |
Very simple code:
The project.json is mostly the default from
dotnet new:{ "buildOptions": { "emitEntryPoint": true, "optimize": true }, "frameworks": { "netcoreapp1.0": { "dependencies": { "Microsoft.NETCore.App": { "type": "platform", "version": "1.0.0-*" } }, "imports": "dnxcore50" } } }On both Linux (Ubuntu 15.10 and Ubuntu 16.04) the output is something like:
On Windows, on the same hardware, it's:
Version info: .NET Command Line Tools (1.0.0-preview1-002702) on Windows and Ubuntu 15.10; 1.0.0-preview2-002886 on Ubuntu 16.04.
Note that under NUnit, every equality assertion involves three
EndsWithcalls, making NUnit assertions basically horrifically expensive on Linux...