Is there a published way to take text that may have accent marks and remove the accent marks?

For example:

System.assertEquals('Hello', transform('Ḧéļḻṏ'));

I believe there is an equivalent method in Java, C# and JavaScript. I can't seem to find the Apex equivalent and I'm disturbed that I might have to do it on my own.

share|improve this question
    
You basically just need to transform the answers here to Apex. I'm working on the Map approach but getting some compile issues. – Adrian Larson 10 hours ago

Someone has already published a complete solution. You just need to translate this answer from Javascript to Apex. The author of that answer credits this post, which was posted by lehel on May 6, 2011 and archived in 2012. I can't testify to the completeness, but here's the translation:

public class Accents
{
    public static String removeDiacritics(String text)
    {
        for (String letter : patterns.keySet())
            text = text.replaceAll(patterns.get(letter), letter);
        return text;
    }
    static Map<String, String> patterns = new Map<String, String>
    {
        'A' => '[\u0041\u24B6\uFF21\u00C0\u00C1\u00C2\u1EA6\u1EA4\u1EAA\u1EA8\u00C3\u0100\u0102\u1EB0\u1EAE\u1EB4\u1EB2\u0226\u01E0\u00C4\u01DE\u1EA2\u00C5\u01FA\u01CD\u0200\u0202\u1EA0\u1EAC\u1EB6\u1E00\u0104\u023A\u2C6F]',
        'AA' => '[\uA732]',
        'AE' => '[\u00C6\u01FC\u01E2]',
        'AO' => '[\uA734]',
        'AU' => '[\uA736]',
        'AV' => '[\uA738\uA73A]',
        'AY' => '[\uA73C]',
        'B' => '[\u0042\u24B7\uFF22\u1E02\u1E04\u1E06\u0243\u0182\u0181]',
        'C' => '[\u0043\u24B8\uFF23\u0106\u0108\u010A\u010C\u00C7\u1E08\u0187\u023B\uA73E]',
        'D' => '[\u0044\u24B9\uFF24\u1E0A\u010E\u1E0C\u1E10\u1E12\u1E0E\u0110\u018B\u018A\u0189\uA779]',
        'DZ' => '[\u01F1\u01C4]',
        'Dz' => '[\u01F2\u01C5]',
        'E' => '[\u0045\u24BA\uFF25\u00C8\u00C9\u00CA\u1EC0\u1EBE\u1EC4\u1EC2\u1EBC\u0112\u1E14\u1E16\u0114\u0116\u00CB\u1EBA\u011A\u0204\u0206\u1EB8\u1EC6\u0228\u1E1C\u0118\u1E18\u1E1A\u0190\u018E]',
        'F' => '[\u0046\u24BB\uFF26\u1E1E\u0191\uA77B]',
        'G' => '[\u0047\u24BC\uFF27\u01F4\u011C\u1E20\u011E\u0120\u01E6\u0122\u01E4\u0193\uA7A0\uA77D\uA77E]',
        'H' => '[\u0048\u24BD\uFF28\u0124\u1E22\u1E26\u021E\u1E24\u1E28\u1E2A\u0126\u2C67\u2C75\uA78D]',
        'I' => '[\u0049\u24BE\uFF29\u00CC\u00CD\u00CE\u0128\u012A\u012C\u0130\u00CF\u1E2E\u1EC8\u01CF\u0208\u020A\u1ECA\u012E\u1E2C\u0197]',
        'J' => '[\u004A\u24BF\uFF2A\u0134\u0248]',
        'K' => '[\u004B\u24C0\uFF2B\u1E30\u01E8\u1E32\u0136\u1E34\u0198\u2C69\uA740\uA742\uA744\uA7A2]',
        'L' => '[\u004C\u24C1\uFF2C\u013F\u0139\u013D\u1E36\u1E38\u013B\u1E3C\u1E3A\u0141\u023D\u2C62\u2C60\uA748\uA746\uA780]',
        'LJ' => '[\u01C7]',
        'Lj' => '[\u01C8]',
        'M' => '[\u004D\u24C2\uFF2D\u1E3E\u1E40\u1E42\u2C6E\u019C]',
        'N' => '[\u004E\u24C3\uFF2E\u01F8\u0143\u00D1\u1E44\u0147\u1E46\u0145\u1E4A\u1E48\u0220\u019D\uA790\uA7A4]',
        'NJ' => '[\u01CA]',
        'Nj' => '[\u01CB]',
        'O' => '[\u004F\u24C4\uFF2F\u00D2\u00D3\u00D4\u1ED2\u1ED0\u1ED6\u1ED4\u00D5\u1E4C\u022C\u1E4E\u014C\u1E50\u1E52\u014E\u022E\u0230\u00D6\u022A\u1ECE\u0150\u01D1\u020C\u020E\u01A0\u1EDC\u1EDA\u1EE0\u1EDE\u1EE2\u1ECC\u1ED8\u01EA\u01EC\u00D8\u01FE\u0186\u019F\uA74A\uA74C]',
        'OI' => '[\u01A2]',
        'OO' => '[\uA74E]',
        'OU' => '[\u0222]',
        'P' => '[\u0050\u24C5\uFF30\u1E54\u1E56\u01A4\u2C63\uA750\uA752\uA754]',
        'Q' => '[\u0051\u24C6\uFF31\uA756\uA758\u024A]',
        'R' => '[\u0052\u24C7\uFF32\u0154\u1E58\u0158\u0210\u0212\u1E5A\u1E5C\u0156\u1E5E\u024C\u2C64\uA75A\uA7A6\uA782]',
        'S' => '[\u0053\u24C8\uFF33\u1E9E\u015A\u1E64\u015C\u1E60\u0160\u1E66\u1E62\u1E68\u0218\u015E\u2C7E\uA7A8\uA784]',
        'T' => '[\u0054\u24C9\uFF34\u1E6A\u0164\u1E6C\u021A\u0162\u1E70\u1E6E\u0166\u01AC\u01AE\u023E\uA786]',
        'TZ' => '[\uA728]',
        'U' => '[\u0055\u24CA\uFF35\u00D9\u00DA\u00DB\u0168\u1E78\u016A\u1E7A\u016C\u00DC\u01DB\u01D7\u01D5\u01D9\u1EE6\u016E\u0170\u01D3\u0214\u0216\u01AF\u1EEA\u1EE8\u1EEE\u1EEC\u1EF0\u1EE4\u1E72\u0172\u1E76\u1E74\u0244]',
        'V' => '[\u0056\u24CB\uFF36\u1E7C\u1E7E\u01B2\uA75E\u0245]',
        'VY' => '[\uA760]',
        'W' => '[\u0057\u24CC\uFF37\u1E80\u1E82\u0174\u1E86\u1E84\u1E88\u2C72]',
        'X' => '[\u0058\u24CD\uFF38\u1E8A\u1E8C]',
        'Y' => '[\u0059\u24CE\uFF39\u1EF2\u00DD\u0176\u1EF8\u0232\u1E8E\u0178\u1EF6\u1EF4\u01B3\u024E\u1EFE]',
        'Z' => '[\u005A\u24CF\uFF3A\u0179\u1E90\u017B\u017D\u1E92\u1E94\u01B5\u0224\u2C7F\u2C6B\uA762]',
        'a' => '[\u0061\u24D0\uFF41\u1E9A\u00E0\u00E1\u00E2\u1EA7\u1EA5\u1EAB\u1EA9\u00E3\u0101\u0103\u1EB1\u1EAF\u1EB5\u1EB3\u0227\u01E1\u00E4\u01DF\u1EA3\u00E5\u01FB\u01CE\u0201\u0203\u1EA1\u1EAD\u1EB7\u1E01\u0105\u2C65\u0250]',
        'aa' => '[\uA733]',
        'ae' => '[\u00E6\u01FD\u01E3]',
        'ao' => '[\uA735]',
        'au' => '[\uA737]',
        'av' => '[\uA739\uA73B]',
        'ay' => '[\uA73D]',
        'b' => '[\u0062\u24D1\uFF42\u1E03\u1E05\u1E07\u0180\u0183\u0253]',
        'c' => '[\u0063\u24D2\uFF43\u0107\u0109\u010B\u010D\u00E7\u1E09\u0188\u023C\uA73F\u2184]',
        'd' => '[\u0064\u24D3\uFF44\u1E0B\u010F\u1E0D\u1E11\u1E13\u1E0F\u0111\u018C\u0256\u0257\uA77A]',
        'dz' => '[\u01F3\u01C6]',
        'e' => '[\u0065\u24D4\uFF45\u00E8\u00E9\u00EA\u1EC1\u1EBF\u1EC5\u1EC3\u1EBD\u0113\u1E15\u1E17\u0115\u0117\u00EB\u1EBB\u011B\u0205\u0207\u1EB9\u1EC7\u0229\u1E1D\u0119\u1E19\u1E1B\u0247\u025B\u01DD]',
        'f' => '[\u0066\u24D5\uFF46\u1E1F\u0192\uA77C]',
        'g' => '[\u0067\u24D6\uFF47\u01F5\u011D\u1E21\u011F\u0121\u01E7\u0123\u01E5\u0260\uA7A1\u1D79\uA77F]',
        'h' => '[\u0068\u24D7\uFF48\u0125\u1E23\u1E27\u021F\u1E25\u1E29\u1E2B\u1E96\u0127\u2C68\u2C76\u0265]',
        'hv' => '[\u0195]',
        'i' => '[\u0069\u24D8\uFF49\u00EC\u00ED\u00EE\u0129\u012B\u012D\u00EF\u1E2F\u1EC9\u01D0\u0209\u020B\u1ECB\u012F\u1E2D\u0268\u0131]',
        'j' => '[\u006A\u24D9\uFF4A\u0135\u01F0\u0249]',
        'k' => '[\u006B\u24DA\uFF4B\u1E31\u01E9\u1E33\u0137\u1E35\u0199\u2C6A\uA741\uA743\uA745\uA7A3]',
        'l' => '[\u006C\u24DB\uFF4C\u0140\u013A\u013E\u1E37\u1E39\u013C\u1E3D\u1E3B\u017F\u0142\u019A\u026B\u2C61\uA749\uA781\uA747]',
        'lj' => '[\u01C9]',
        'm' => '[\u006D\u24DC\uFF4D\u1E3F\u1E41\u1E43\u0271\u026F]',
        'n' => '[\u006E\u24DD\uFF4E\u01F9\u0144\u00F1\u1E45\u0148\u1E47\u0146\u1E4B\u1E49\u019E\u0272\u0149\uA791\uA7A5]',
        'nj' => '[\u01CC]',
        'o' => '[\u006F\u24DE\uFF4F\u00F2\u00F3\u00F4\u1ED3\u1ED1\u1ED7\u1ED5\u00F5\u1E4D\u022D\u1E4F\u014D\u1E51\u1E53\u014F\u022F\u0231\u00F6\u022B\u1ECF\u0151\u01D2\u020D\u020F\u01A1\u1EDD\u1EDB\u1EE1\u1EDF\u1EE3\u1ECD\u1ED9\u01EB\u01ED\u00F8\u01FF\u0254\uA74B\uA74D\u0275]',
        'oi' => '[\u01A3]',
        'ou' => '[\u0223]',
        'oo' => '[\uA74F]',
        'p' => '[\u0070\u24DF\uFF50\u1E55\u1E57\u01A5\u1D7D\uA751\uA753\uA755]',
        'q' => '[\u0071\u24E0\uFF51\u024B\uA757\uA759]',
        'r' => '[\u0072\u24E1\uFF52\u0155\u1E59\u0159\u0211\u0213\u1E5B\u1E5D\u0157\u1E5F\u024D\u027D\uA75B\uA7A7\uA783]',
        's' => '[\u0073\u24E2\uFF53\u00DF\u015B\u1E65\u015D\u1E61\u0161\u1E67\u1E63\u1E69\u0219\u015F\u023F\uA7A9\uA785\u1E9B]',
        't' => '[\u0074\u24E3\uFF54\u1E6B\u1E97\u0165\u1E6D\u021B\u0163\u1E71\u1E6F\u0167\u01AD\u0288\u2C66\uA787]',
        'tz' => '[\uA729]',
        'u' => '[\u0075\u24E4\uFF55\u00F9\u00FA\u00FB\u0169\u1E79\u016B\u1E7B\u016D\u00FC\u01DC\u01D8\u01D6\u01DA\u1EE7\u016F\u0171\u01D4\u0215\u0217\u01B0\u1EEB\u1EE9\u1EEF\u1EED\u1EF1\u1EE5\u1E73\u0173\u1E77\u1E75\u0289]',
        'v' => '[\u0076\u24E5\uFF56\u1E7D\u1E7F\u028B\uA75F\u028C]',
        'vy' => '[\uA761]',
        'w' => '[\u0077\u24E6\uFF57\u1E81\u1E83\u0175\u1E87\u1E85\u1E98\u1E89\u2C73]',
        'x' => '[\u0078\u24E7\uFF58\u1E8B\u1E8D]',
        'y' => '[\u0079\u24E8\uFF59\u1EF3\u00FD\u0177\u1EF9\u0233\u1E8F\u00FF\u1EF7\u1E99\u1EF5\u01B4\u024F\u1EFF]',
        'z' => '[\u007A\u24E9\uFF5A\u017A\u1E91\u017C\u017E\u1E93\u1E95\u01B6\u0225\u0240\u2C6C\uA763]'
    };
}

It sure does work with your desired input/output combo:

system.assertEquals('Hello', Accents.removeDiacritics('Ḧéļḻṏ'));

PS

This approach is painfully slow, taking approximately 1.8ms per execution. I would avoid using it in a loop where you may attempt to "clean" more than 1000 strings.

PPS

You can't compile this class in Execute Anonymous, but it will save in a top-level class. I guess there may be a limit to how many map keys you can specify in a literal constructor using Execute Anonymous, but I haven't isolated the cause of the error.

PPPS

As noted by Charles, you can take this concept and greatly increase performance by taking advantage of the getChars and fromCharArray methods. If you map each Unicode character catalogued below to the corresponding letter, it's about ten times faster.

share|improve this answer
    
I wonder what the performance on this is? Regardless, do you mind if I throw this into my utility project I've been working on? – sfdcfox 10 hours ago
    
@sfdcfox Really, really slow. Just added the numbers but it's almost 2ms per execution. Granted, that's with some logging turned on, but nearly as low as you can go. As for inclusion, feel free if it's open source! Obviously for either of us to use it we would need to attribute Backbone Paginator. We should really collaborate on that utility project... – Adrian Larson 10 hours ago
    
This is great! I bet if this were keyed the other way around it would work better. Or maybe using already-compiled Patterns? – Charles Koppelman 6 hours ago
    
I don't think key order matters, you need the key and the value both. You could condense it using a wrapper or something, might save you a few microseconds. Reversing the order would make it really torturous to read, though. – Adrian Larson 6 hours ago
    
@AdrianLarson Can you run a benchmark on this vs. my just-added answer that I based on your method? What are you using as a test case? – Charles Koppelman 5 hours ago

This is a revision of Adrian Larson's previous answer for benchmark comparison:

static Map<Integer, String> sec = new Map<Integer, String>{
65=>'A', 9398=>'A', 65313=>'A', 192=>'A', 193=>'A', 194=>'A', 7846=>'A', 7844=>'A', 7850=>'A', 7848=>'A', 195=>'A', 256=>'A', 258=>'A', 7856=>'A', 7854=>'A', 7860=>'A', 7858=>'A', 550=>'A', 480=>'A', 196=>'A', 478=>'A', 7842=>'A', 197=>'A', 506=>'A', 461=>'A', 512=>'A', 514=>'A', 7840=>'A', 7852=>'A', 7862=>'A', 7680=>'A', 260=>'A', 570=>'A', 11375=>'A', 42802=>'AA', 198=>'AE', 508=>'AE', 482=>'AE', 42804=>'AO', 42806=>'AU', 42808=>'AV', 42810=>'AV', 42812=>'AY', 66=>'B', 9399=>'B', 65314=>'B', 7682=>'B', 7684=>'B', 7686=>'B', 579=>'B', 386=>'B', 385=>'B', 67=>'C', 9400=>'C', 65315=>'C', 262=>'C', 264=>'C', 266=>'C', 268=>'C', 199=>'C', 7688=>'C', 391=>'C', 571=>'C', 42814=>'C', 68=>'D', 9401=>'D', 65316=>'D', 7690=>'D', 270=>'D', 7692=>'D', 7696=>'D', 7698=>'D', 7694=>'D', 272=>'D', 395=>'D', 394=>'D', 393=>'D', 42873=>'D', 497=>'DZ', 452=>'DZ', 498=>'Dz', 453=>'Dz', 69=>'E', 9402=>'E', 65317=>'E', 200=>'E', 201=>'E', 202=>'E', 7872=>'E', 7870=>'E', 7876=>'E', 7874=>'E', 7868=>'E', 274=>'E', 7700=>'E', 7702=>'E', 276=>'E', 278=>'E', 203=>'E', 7866=>'E', 282=>'E', 516=>'E', 518=>'E', 7864=>'E', 7878=>'E', 552=>'E', 7708=>'E', 280=>'E', 7704=>'E', 7706=>'E', 400=>'E', 398=>'E', 70=>'F', 9403=>'F', 65318=>'F', 7710=>'F', 401=>'F', 42875=>'F', 71=>'G', 9404=>'G', 65319=>'G', 500=>'G', 284=>'G', 7712=>'G', 286=>'G', 288=>'G', 486=>'G', 290=>'G', 484=>'G', 403=>'G', 42912=>'G', 42877=>'G', 42878=>'G', 72=>'H', 9405=>'H', 65320=>'H', 292=>'H', 7714=>'H', 7718=>'H', 542=>'H', 7716=>'H', 7720=>'H', 7722=>'H', 294=>'H', 11367=>'H', 11381=>'H', 42893=>'H', 73=>'I', 9406=>'I', 65321=>'I', 204=>'I', 205=>'I', 206=>'I', 296=>'I', 298=>'I', 300=>'I', 304=>'I', 207=>'I', 7726=>'I', 7880=>'I', 463=>'I', 520=>'I', 522=>'I', 7882=>'I', 302=>'I', 7724=>'I', 407=>'I', 74=>'J', 9407=>'J', 65322=>'J', 308=>'J', 584=>'J', 75=>'K', 9408=>'K', 65323=>'K', 7728=>'K', 488=>'K', 7730=>'K', 310=>'K', 7732=>'K', 408=>'K', 11369=>'K', 42816=>'K', 42818=>'K', 42820=>'K', 42914=>'K', 76=>'L', 9409=>'L', 65324=>'L', 319=>'L', 313=>'L', 317=>'L', 7734=>'L', 7736=>'L', 315=>'L', 7740=>'L', 7738=>'L', 321=>'L', 573=>'L', 11362=>'L', 11360=>'L', 42824=>'L', 42822=>'L', 42880=>'L', 455=>'LJ', 456=>'Lj', 77=>'M', 9410=>'M', 65325=>'M', 7742=>'M', 7744=>'M', 7746=>'M', 11374=>'M', 412=>'M', 78=>'N', 9411=>'N', 65326=>'N', 504=>'N', 323=>'N', 209=>'N', 7748=>'N', 327=>'N', 7750=>'N', 325=>'N', 7754=>'N', 7752=>'N', 544=>'N', 413=>'N', 42896=>'N', 42916=>'N', 458=>'NJ', 459=>'Nj', 79=>'O', 9412=>'O', 65327=>'O', 210=>'O', 211=>'O', 212=>'O', 7890=>'O', 7888=>'O', 7894=>'O', 7892=>'O', 213=>'O', 7756=>'O', 556=>'O', 7758=>'O', 332=>'O', 7760=>'O', 7762=>'O', 334=>'O', 558=>'O', 560=>'O', 214=>'O', 554=>'O', 7886=>'O', 336=>'O', 465=>'O', 524=>'O', 526=>'O', 416=>'O', 7900=>'O', 7898=>'O', 7904=>'O', 7902=>'O', 7906=>'O', 7884=>'O', 7896=>'O', 490=>'O', 492=>'O', 216=>'O', 510=>'O', 390=>'O', 415=>'O', 42826=>'O', 42828=>'O', 418=>'OI', 42830=>'OO', 546=>'OU', 80=>'P', 9413=>'P', 65328=>'P', 7764=>'P', 7766=>'P', 420=>'P', 11363=>'P', 42832=>'P', 42834=>'P', 42836=>'P', 81=>'Q', 9414=>'Q', 65329=>'Q', 42838=>'Q', 42840=>'Q', 586=>'Q', 82=>'R', 9415=>'R', 65330=>'R', 340=>'R', 7768=>'R', 344=>'R', 528=>'R', 530=>'R', 7770=>'R', 7772=>'R', 342=>'R', 7774=>'R', 588=>'R', 11364=>'R', 42842=>'R', 42918=>'R', 42882=>'R', 83=>'S', 9416=>'S', 65331=>'S', 7838=>'S', 346=>'S', 7780=>'S', 348=>'S', 7776=>'S', 352=>'S', 7782=>'S', 7778=>'S', 7784=>'S', 536=>'S', 350=>'S', 11390=>'S', 42920=>'S', 42884=>'S', 84=>'T', 9417=>'T', 65332=>'T', 7786=>'T', 356=>'T', 7788=>'T', 538=>'T', 354=>'T', 7792=>'T', 7790=>'T', 358=>'T', 428=>'T', 430=>'T', 574=>'T', 42886=>'T', 42792=>'TZ', 85=>'U', 9418=>'U', 65333=>'U', 217=>'U', 218=>'U', 219=>'U', 360=>'U', 7800=>'U', 362=>'U', 7802=>'U', 364=>'U', 220=>'U', 475=>'U', 471=>'U', 469=>'U', 473=>'U', 7910=>'U', 366=>'U', 368=>'U', 467=>'U', 532=>'U', 534=>'U', 431=>'U', 7914=>'U', 7912=>'U', 7918=>'U', 7916=>'U', 7920=>'U', 7908=>'U', 7794=>'U', 370=>'U', 7798=>'U', 7796=>'U', 580=>'U', 86=>'V', 9419=>'V', 65334=>'V', 7804=>'V', 7806=>'V', 434=>'V', 42846=>'V', 581=>'V', 42848=>'VY', 87=>'W', 9420=>'W', 65335=>'W', 7808=>'W', 7810=>'W', 372=>'W', 7814=>'W', 7812=>'W', 7816=>'W', 11378=>'W', 88=>'X', 9421=>'X', 65336=>'X', 7818=>'X', 7820=>'X', 89=>'Y', 9422=>'Y', 65337=>'Y', 7922=>'Y', 221=>'Y', 374=>'Y', 7928=>'Y', 562=>'Y', 7822=>'Y', 376=>'Y', 7926=>'Y', 7924=>'Y', 435=>'Y', 590=>'Y', 7934=>'Y', 90=>'Z', 9423=>'Z', 65338=>'Z', 377=>'Z', 7824=>'Z', 379=>'Z', 381=>'Z', 7826=>'Z', 7828=>'Z', 437=>'Z', 548=>'Z', 11391=>'Z', 11371=>'Z', 42850=>'Z', 97=>'a', 9424=>'a', 65345=>'a', 7834=>'a', 224=>'a', 225=>'a', 226=>'a', 7847=>'a', 7845=>'a', 7851=>'a', 7849=>'a', 227=>'a', 257=>'a', 259=>'a', 7857=>'a', 7855=>'a', 7861=>'a', 7859=>'a', 551=>'a', 481=>'a', 228=>'a', 479=>'a', 7843=>'a', 229=>'a', 507=>'a', 462=>'a', 513=>'a', 515=>'a', 7841=>'a', 7853=>'a', 7863=>'a', 7681=>'a', 261=>'a', 11365=>'a', 592=>'a', 42803=>'aa', 230=>'ae', 509=>'ae', 483=>'ae', 42805=>'ao', 42807=>'au', 42809=>'av', 42811=>'av', 42813=>'ay', 98=>'b', 9425=>'b', 65346=>'b', 7683=>'b', 7685=>'b', 7687=>'b', 384=>'b', 387=>'b', 595=>'b', 99=>'c', 9426=>'c', 65347=>'c', 263=>'c', 265=>'c', 267=>'c', 269=>'c', 231=>'c', 7689=>'c', 392=>'c', 572=>'c', 42815=>'c', 8580=>'c', 100=>'d', 9427=>'d', 65348=>'d', 7691=>'d', 271=>'d', 7693=>'d', 7697=>'d', 7699=>'d', 7695=>'d', 273=>'d', 396=>'d', 598=>'d', 599=>'d', 42874=>'d', 499=>'dz', 454=>'dz', 101=>'e', 9428=>'e', 65349=>'e', 232=>'e', 233=>'e', 234=>'e', 7873=>'e', 7871=>'e', 7877=>'e', 7875=>'e', 7869=>'e', 275=>'e', 7701=>'e', 7703=>'e', 277=>'e', 279=>'e', 235=>'e', 7867=>'e', 283=>'e', 517=>'e', 519=>'e', 7865=>'e', 7879=>'e', 553=>'e', 7709=>'e', 281=>'e', 7705=>'e', 7707=>'e', 583=>'e', 603=>'e', 477=>'e', 102=>'f', 9429=>'f', 65350=>'f', 7711=>'f', 402=>'f', 42876=>'f', 103=>'g', 9430=>'g', 65351=>'g', 501=>'g', 285=>'g', 7713=>'g', 287=>'g', 289=>'g', 487=>'g', 291=>'g', 485=>'g', 608=>'g', 42913=>'g', 7545=>'g', 42879=>'g', 104=>'h', 9431=>'h', 65352=>'h', 293=>'h', 7715=>'h', 7719=>'h', 543=>'h', 7717=>'h', 7721=>'h', 7723=>'h', 7830=>'h', 295=>'h', 11368=>'h', 11382=>'h', 613=>'h', 405=>'hv', 105=>'i', 9432=>'i', 65353=>'i', 236=>'i', 237=>'i', 238=>'i', 297=>'i', 299=>'i', 301=>'i', 239=>'i', 7727=>'i', 7881=>'i', 464=>'i', 521=>'i', 523=>'i', 7883=>'i', 303=>'i', 7725=>'i', 616=>'i', 305=>'i', 106=>'j', 9433=>'j', 65354=>'j', 309=>'j', 496=>'j', 585=>'j', 107=>'k', 9434=>'k', 65355=>'k', 7729=>'k', 489=>'k', 7731=>'k', 311=>'k', 7733=>'k', 409=>'k', 11370=>'k', 42817=>'k', 42819=>'k', 42821=>'k', 42915=>'k', 108=>'l', 9435=>'l', 65356=>'l', 320=>'l', 314=>'l', 318=>'l', 7735=>'l', 7737=>'l', 316=>'l', 7741=>'l', 7739=>'l', 383=>'l', 322=>'l', 410=>'l', 619=>'l', 11361=>'l', 42825=>'l', 42881=>'l', 42823=>'l', 457=>'lj', 109=>'m', 9436=>'m', 65357=>'m', 7743=>'m', 7745=>'m', 7747=>'m', 625=>'m', 623=>'m', 110=>'n', 9437=>'n', 65358=>'n', 505=>'n', 324=>'n', 241=>'n', 7749=>'n', 328=>'n', 7751=>'n', 326=>'n', 7755=>'n', 7753=>'n', 414=>'n', 626=>'n', 329=>'n', 42897=>'n', 42917=>'n', 460=>'nj', 111=>'o', 9438=>'o', 65359=>'o', 242=>'o', 243=>'o', 244=>'o', 7891=>'o', 7889=>'o', 7895=>'o', 7893=>'o', 245=>'o', 7757=>'o', 557=>'o', 7759=>'o', 333=>'o', 7761=>'o', 7763=>'o', 335=>'o', 559=>'o', 561=>'o', 246=>'o', 555=>'o', 7887=>'o', 337=>'o', 466=>'o', 525=>'o', 527=>'o', 417=>'o', 7901=>'o', 7899=>'o', 7905=>'o', 7903=>'o', 7907=>'o', 7885=>'o', 7897=>'o', 491=>'o', 493=>'o', 248=>'o', 511=>'o', 596=>'o', 42827=>'o', 42829=>'o', 629=>'o', 419=>'oi', 547=>'ou', 42831=>'oo', 112=>'p', 9439=>'p', 65360=>'p', 7765=>'p', 7767=>'p', 421=>'p', 7549=>'p', 42833=>'p', 42835=>'p', 42837=>'p', 113=>'q', 9440=>'q', 65361=>'q', 587=>'q', 42839=>'q', 42841=>'q', 114=>'r', 9441=>'r', 65362=>'r', 341=>'r', 7769=>'r', 345=>'r', 529=>'r', 531=>'r', 7771=>'r', 7773=>'r', 343=>'r', 7775=>'r', 589=>'r', 637=>'r', 42843=>'r', 42919=>'r', 42883=>'r', 115=>'s', 9442=>'s', 65363=>'s', 223=>'s', 347=>'s', 7781=>'s', 349=>'s', 7777=>'s', 353=>'s', 7783=>'s', 7779=>'s', 7785=>'s', 537=>'s', 351=>'s', 575=>'s', 42921=>'s', 42885=>'s', 7835=>'s', 116=>'t', 9443=>'t', 65364=>'t', 7787=>'t', 7831=>'t', 357=>'t', 7789=>'t', 539=>'t', 355=>'t', 7793=>'t', 7791=>'t', 359=>'t', 429=>'t', 648=>'t', 11366=>'t', 42887=>'t', 42793=>'tz', 117=>'u', 9444=>'u', 65365=>'u', 249=>'u', 250=>'u', 251=>'u', 361=>'u', 7801=>'u', 363=>'u', 7803=>'u', 365=>'u', 252=>'u', 476=>'u', 472=>'u', 470=>'u', 474=>'u', 7911=>'u', 367=>'u', 369=>'u', 468=>'u', 533=>'u', 535=>'u', 432=>'u', 7915=>'u', 7913=>'u', 7919=>'u', 7917=>'u', 7921=>'u', 7909=>'u', 7795=>'u', 371=>'u', 7799=>'u', 7797=>'u', 649=>'u', 118=>'v', 9445=>'v', 65366=>'v', 7805=>'v', 7807=>'v', 651=>'v', 42847=>'v', 652=>'v', 42849=>'vy', 119=>'w', 9446=>'w', 65367=>'w', 7809=>'w', 7811=>'w', 373=>'w', 7815=>'w', 7813=>'w', 7832=>'w', 7817=>'w', 11379=>'w', 120=>'x', 9447=>'x', 65368=>'x', 7819=>'x', 7821=>'x', 121=>'y', 9448=>'y', 65369=>'y', 7923=>'y', 253=>'y', 375=>'y', 7929=>'y', 563=>'y', 7823=>'y', 255=>'y', 7927=>'y', 7833=>'y', 7925=>'y', 436=>'y', 591=>'y', 7935=>'y', 122=>'z', 9449=>'z', 65370=>'z', 378=>'z', 7825=>'z', 380=>'z', 382=>'z', 7827=>'z', 7829=>'z', 438=>'z', 549=>'z', 576=>'z', 11372=>'z', 42851=>'z' 
  };

String removeDiacritics(String str) {
  List<Integer> result = new List<Integer>();
  for (Integer chr: str.getChars()) {
    if (sec.containsKey(chr)) {
      result.addAll(sec.get(chr).getChars());
    } else {
      result.add(chr);
    }
  }
  return String.fromCharArray(result);
}
share|improve this answer
    
Ah, now I see what you mean. Might take a look at the performance difference tomorrow if no one beats me to it. – Adrian Larson 5 hours ago
    
@AdrianLarson The nice thing about your way is that it's predictable. This way, it varies based on string length. Probably would be better to run the getChars when setting up the Map, though, instead of inline. – Charles Koppelman 5 hours ago
    
Honestly, I just did a direct translation of a published solution, which is what you asked for. I didn't put a ton of thought into optimization, since it was a bit of a pain to compile. If you care about speed, it seems likely that this approach would win, even if it's a readability hit. Looks like the difference is about 10x with the input string provided in your OP. – Adrian Larson 5 hours ago
1  
@AdrianLarson Thank you for the direct translation of a published solution. That is what I asked for and much more thorough than any list I could have ever compiled. This was just an iterative improvement (unless heap size is a problem). – Charles Koppelman 5 hours ago
    
It's a nice improvement! Thanks for sharing! – Adrian Larson 5 hours ago

There's no simple transform method built-in to the library. If you wanted to take the time to map them all out, it would be possible to use a long series of String.replaceAll to manually go through the process, something like:

myString = myString.replaceAll('[ÀÁÂÃÄÅ]','A');
myString = myString.replaceAll('[àáâãäå]','a');
// ....

This would undoubtedly be particularly painful, but it could be possible.

share|improve this answer
    
Painful and erroneous since it would omit deletion of combining diacritics (like the e followed by accent grave), but it may be what I'm forced to do.... – Charles Koppelman 10 hours ago

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.