detfalskested

Reimplementing Microsoft SQL Server `PWDENCRYPT` / `PWDCOMPARE` in Python

This was originally posted as a self-answered question on Stack Overflow.

The Background Story

We're currently rebuilding our platform in Python and Django. The existing platform is based on .NET and related MS technologies. We're soon preparing to launch, and part of that is also migrating user data from the old to the new platform.

In the old system, user passwords are hashed and verified using the built-in PWDENCRYPT and PWDCOMPARE functions of Microsoft SQL Server.

As we want the transition to be as frictionless as possible for our users, we of course also want them to not have to reset their password after we launch. So I got the seemingly simple task to create a custom Django password hasher that is able to verify user entered passwords againts the hashes from PWDENCRYPT.

First I set out to find some documentation on what those MS SQL functions actually do. It seems to officially not be very well documented what happens behind the scenes (if at all), and according to the docs you're not really supposed to use it:

PWDENCRYPT is an older function and might not be supported in a future release of SQL Server. Use HASHBYTES instead. HASHBYTES provides more hashing algorithms.

Fortunately, a couple of Stack Overflow questions were able to guide me in the right direction, although not all the way, due to errors in their explanations and examples, which sent me on several detours.

After spending way too much time on it, I gave up. But then a colleague gave it a try and found the missing key: The passwords must be encoded using UTF-16LE. A detail I missed, but might be obvious to someone who is used to work in a Windows environment.

The Solution

So, let's go through it step by step.

PWDENCRYPT

Let's try generating a password hash in SQL Server and look at the result:

SELECT PWDENCRYPT("very-secret-passwørd")
GO

0x02008D6818FFB31BAAA6F849583A0C1319E1E73A260525CFEC22D4BC9FE2FBCC535C682342634D5B87B2ED1A1DA2C63B7FED3CCDFC82DE31DD5670CCAC9BA7BA5866EB893843

The result consists of 4 parts:

0x an indication that this is a hexadecimal value.

0200 tells us the version of the hashing algorithm used. In this case, it tells us that it's hashed using SHA-512 (or SHA2_512 in the vocabulary of SQL Server). If the header is 0100, it is instead hashed using SHA1.

8D6818FF is the salt for the hash.

B31BAAA6F849583A0C1319E1E73A260525CFEC22D4BC9FE2FBCC535C682342634D5B87B2ED1A1DA2C63B7FED3CCDFC82DE31DD5670CCAC9BA7BA5866EB893843

Is the actual salted and hashed password.

PWDCOMPARE

This function is used to compare a password against a salted hash. Let's try it out:

SELECT PWDCOMPARE("very-secret-passwørd", 0x02008D6818FFB31BAAA6F849583A0C1319E1E73A260525CFEC22D4BC9FE2FBCC535C682342634D5B87B2ED1A1DA2C63B7FED3CCDFC82DE31DD5670CCAC9BA7BA5866EB893843)
GO

-----------
          1

1 is a boolean value, indicating that the password matched the hashed password. Using another hash, the result will be 0:

SELECT PWDCOMPARE("very-secret-passwørd", 0x02008D6818FFB31BAAA6F849583A0C1319E1E73A260525CFEC22D4BC9FE2FBCC535C682342634D5B87B2ED1A1DA2C63B7FED3CCDFC82DE31DD5670CCAC9BA7BA5866EB893800)
GO

-----------
          0

I changed the last 2 digits of the hash to be 00 and got a 0 back: The password did not match the hashed password. So far so good.

Reimplementing in SQL Using HASHBYTES

Now, in order to understand what's going on, I sort of reimplemented this with HASHBYTES based on the information found in the previously mentioned Stack Overflow questions:

SELECT HASHBYTES("SHA2_512", "very-secret-passwørd" + CONVERT(NVARCHAR(MAX), 0x8D6818FF))

0xB31BAAA6F849583A0C1319E1E73A260525CFEC22D4BC9FE2FBCC535C682342634D5B87B2ED1A1DA2C63B7FED3CCDFC82DE31DD5670CCAC9BA7BA5866EB893843

Yay! I get the same hash. All good.

Reimplementing in Python

And then the grand finale: Doing the same in Python. This is where I ended up spending days in agony, but I will save you from it and show you what to do instead.

>>> import hashlib
>>> hashlib.sha512("very-secret-passwørd".encode("utf-16le") + bytes.fromhex("8D6818FF")).hexdigest()
'b31baaa6f849583a0c1319e1e73a260525cfec22d4bc9fe2fbcc535c682342634d5b87b2ed1a1da2c63b7fed3ccdfc82de31dd5670ccac9ba7ba5866eb893843'

There you go! Adding .upper() to the result allows us to compare the hashes from SQL Server and Python:

>>> hashlib.sha512("very-secret-passwørd".encode("utf-16le") + bytes.fromhex("8D6818FF")).hexdigest().upper() == "B31BAAA6F849583A0C1319E1E73A260525CFEC22D4BC9FE2FBCC535C682342634D5B87B2ED1A1DA2C63B7FED3CCDFC82DE31DD5670CCAC9BA7BA5866EB893843"
True