1=pod 2 3=encoding utf8 4 5=head1 NAME 6 7passphrase-encoding 8- How diverse parts of OpenSSL treat pass phrases character encoding 9 10=head1 DESCRIPTION 11 12In a modern world with all sorts of character encodings, the treatment of pass 13phrases has become increasingly complex. 14This manual page attempts to give an overview over how this problem is 15currently addressed in different parts of the OpenSSL library. 16 17=head2 The general case 18 19The OpenSSL library doesn't treat pass phrases in any special way as a general 20rule, and trusts the application or user to choose a suitable character set 21and stick to that throughout the lifetime of affected objects. 22This means that for an object that was encrypted using a pass phrase encoded in 23ISO-8859-1, that object needs to be decrypted using a pass phrase encoded in 24ISO-8859-1. 25Using the wrong encoding is expected to cause a decryption failure. 26 27=head2 PKCS#12 28 29PKCS#12 is a bit different regarding pass phrase encoding. 30The standard stipulates that the pass phrase shall be encoded as an ASN.1 31BMPString, which consists of the code points of the basic multilingual plane, 32encoded in big endian (UCS-2 BE). 33 34OpenSSL tries to adapt to this requirements in one of the following manners: 35 36=over 4 37 38=item 1. 39 40Treats the received pass phrase as UTF-8 encoded and tries to re-encode it to 41UTF-16 (which is the same as UCS-2 for characters U+0000 to U+D7FF and U+E000 42to U+FFFF, but becomes an expansion for any other character), or failing that, 43proceeds with step 2. 44 45=item 2. 46 47Assumes that the pass phrase is encoded in ASCII or ISO-8859-1 and 48opportunistically prepends each byte with a zero byte to obtain the UCS-2 49encoding of the characters, which it stores as a BMPString. 50 51Note that since there is no check of your locale, this may produce UCS-2 / 52UTF-16 characters that do not correspond to the original pass phrase characters 53for other character sets, such as any ISO-8859-X encoding other than 54ISO-8859-1 (or for Windows, CP 1252 with exception for the extra "graphical" 55characters in the 0x80-0x9F range). 56 57=back 58 59OpenSSL versions older than 1.1.0 do variant 2 only, and that is the reason why 60OpenSSL still does this, to be able to read files produced with older versions. 61 62It should be noted that this approach isn't entirely fault free. 63 64A pass phrase encoded in ISO-8859-2 could very well have a sequence such as 650xC3 0xAF (which is the two characters "LATIN CAPITAL LETTER A WITH BREVE" 66and "LATIN CAPITAL LETTER Z WITH DOT ABOVE" in ISO-8859-2 encoding), but would 67be misinterpreted as the perfectly valid UTF-8 encoded code point U+00EF (LATIN 68SMALL LETTER I WITH DIAERESIS) I<if the pass phrase doesn't contain anything that 69would be invalid UTF-8>. 70A pass phrase that contains this kind of byte sequence will give a different 71outcome in OpenSSL 1.1.0 and newer than in OpenSSL older than 1.1.0. 72 73 0x00 0xC3 0x00 0xAF # OpenSSL older than 1.1.0 74 0x00 0xEF # OpenSSL 1.1.0 and newer 75 76On the same accord, anything encoded in UTF-8 that was given to OpenSSL older 77than 1.1.0 was misinterpreted as ISO-8859-1 sequences. 78 79=head2 OSSL_STORE 80 81L<ossl_store(7)> acts as a general interface to access all kinds of objects, 82potentially protected with a pass phrase, a PIN or something else. 83This API stipulates that pass phrases should be UTF-8 encoded, and that any 84other pass phrase encoding may give undefined results. 85This API relies on the application to ensure UTF-8 encoding, and doesn't check 86that this is the case, so what it gets, it will also pass to the underlying 87loader. 88 89=head1 RECOMMENDATIONS 90 91This section assumes that you know what pass phrase was used for encryption, 92but that it may have been encoded in a different character encoding than the 93one used by your current input method. 94For example, the pass phrase may have been used at a time when your default 95encoding was ISO-8859-1 (i.e. "naïve" resulting in the byte sequence 0x6E 0x61 960xEF 0x76 0x65), and you're now in an environment where your default encoding 97is UTF-8 (i.e. "naïve" resulting in the byte sequence 0x6E 0x61 0xC3 0xAF 0x76 980x65). 99Whenever it's mentioned that you should use a certain character encoding, it 100should be understood that you either change the input method to use the 101mentioned encoding when you type in your pass phrase, or use some suitable tool 102to convert your pass phrase from your default encoding to the target encoding. 103 104Also note that the sub-sections below discuss human readable pass phrases. 105This is particularly relevant for PKCS#12 objects, where human readable pass 106phrases are assumed. 107For other objects, it's as legitimate to use any byte sequence (such as a 108sequence of bytes from `/dev/urandom` that's been saved away), which makes any 109character encoding discussion irrelevant; in such cases, simply use the same 110byte sequence as it is. 111 112=head2 Creating new objects 113 114For creating new pass phrase protected objects, make sure the pass phrase is 115encoded using UTF-8. 116This is default on most modern Unixes, but may involve an effort on other 117platforms. 118Specifically for Windows, setting the environment variable 119C<OPENSSL_WIN32_UTF8> will have anything entered on [Windows] console prompt 120converted to UTF-8 (command line and separately prompted pass phrases alike). 121 122=head2 Opening existing objects 123 124For opening pass phrase protected objects where you know what character 125encoding was used for the encryption pass phrase, make sure to use the same 126encoding again. 127 128For opening pass phrase protected objects where the character encoding that was 129used is unknown, or where the producing application is unknown, try one of the 130following: 131 132=over 4 133 134=item 1. 135 136Try the pass phrase that you have as it is in the character encoding of your 137environment. 138It's possible that its byte sequence is exactly right. 139 140=item 2. 141 142Convert the pass phrase to UTF-8 and try with the result. 143Specifically with PKCS#12, this should open up any object that was created 144according to the specification. 145 146=item 3. 147 148Do a naïve (i.e. purely mathematical) ISO-8859-1 to UTF-8 conversion and try 149with the result. 150This differs from the previous attempt because ISO-8859-1 maps directly to 151U+0000 to U+00FF, which other non-UTF-8 character sets do not. 152 153This also takes care of the case when a UTF-8 encoded string was used with 154OpenSSL older than 1.1.0. 155(for example, C<ï>, which is 0xC3 0xAF when encoded in UTF-8, would become 0xC3 1560x83 0xC2 0xAF when re-encoded in the naïve manner. 157The conversion to BMPString would then yield 0x00 0xC3 0x00 0xA4 0x00 0x00, the 158erroneous/non-compliant encoding used by OpenSSL older than 1.1.0) 159 160=back 161 162=head1 SEE ALSO 163 164L<evp(7)>, 165L<ossl_store(7)>, 166L<EVP_BytesToKey(3)>, L<EVP_DecryptInit(3)>, 167L<PEM_do_header(3)>, 168L<PKCS12_parse(3)>, L<PKCS12_newpass(3)>, 169L<d2i_PKCS8PrivateKey_bio(3)> 170 171=head1 COPYRIGHT 172 173Copyright 2018-2020 The OpenSSL Project Authors. All Rights Reserved. 174 175Licensed under the OpenSSL license (the "License"). You may not use 176this file except in compliance with the License. You can obtain a copy 177in the file LICENSE in the source distribution or at 178L<https://www.openssl.org/source/license.html>. 179 180=cut 181