Contents:
The Developer Guide is a "first principles" book - it's not specific to any one language or framework, as they all borrow ideas and syntax from each other. There are highly specific issues in different languages, such as PHP configuration settings or Spring MVC issues, but we need to look past these differences and apply the basic tenets of secure system engineering to application security.
We are re-factoring the original material from the Developer Guide 2. The primary audience for the new version of the Developer Guide is Architects and Developers. The Developer Guide can still be used by penetration testers who want to move up to software verification or improve their craft, but the primary focus will become how to implement secure software from first principles.
It is licensed under the http: The primary contributors to date have been:. If you are one of them, and not on this list, please contact Brad or Steven. See more detailed instructions on the FAQ page. When will the new version be released? We all have day jobs and do this as an altruistic endeavor to make the world a better place. Making progress as fast as we can I only have a few minutes per month to help!
How do I get involved? Please join the mail list , introduce yourself, go find something that needs fixing in the GitHub issue list. Not familiar with git? Get the Dev Guide on GitHub and make edits on your machine, email me your work and I will commit it to the project blaming you in the commit message via a parseable format that I can extract for attribution. I really want to help big time. Please join the mail list , introduce yourself, go find something that needs writing or missing on GitHub , write the first draft and mail it to us on the mail list.
We'll take it from there! The most restrictive and, arguably most desirable is to reject it entirely, without feedback, and make sure the incident is noted through logging or monitoring. But why without feedback? Should we provide our user with information about why the data is invalid? It depends a bit on your contract.
In form example above, if you receive any value other than "email" or "text", something funny is going on: Further, the feedback mechanism might provide the point of attack. Imagine the sendError method writes the text back to the screen as an error message like "We're unable to respond with communicationType ". That's all fine if the communicationType is "carrier pigeon" but what happens if it looks like this? You've now faced with the possibility of a reflective XSS attack that steals session cookies.
If you must provide user feedback, you are best served with a canned response that doesn't echo back untrusted user data, for example "You must choose email or text". If you really can't avoid rendering the user's input back at them, make absolutely sure it's properly encoded see below for details on output encoding. Rejecting input that contains known dangerous values is a strategy referred to as negative validation or blacklisting.
The trouble with this approach is that the number of possible bad inputs is extremely large.
Maintaining a complete list of potentially dangerous input would be a costly and time consuming endeavor. It would also need to be continually maintained. But sometimes it's your only option, for example in cases of free-form input. Resist the temptation to filter out invalid input. This is a practice commonly called "sanitization".
It is essentially a blacklist that removes undesirable input rather than rejecting it. Like other blacklists, it is hard to get right and provides the attacker with more opportunities to evade it. An attacker could bypass it with something as simple as:. Even though your blacklist caught the attack, by fixing it, you just reintroduced the vulnerability.
Input validation functionality is built in to most modern frameworks and, when absent, can also be found in external libraries that enable the developer to put multiple constraints to be applied as rules on a per field basis. Built-in validation of common patterns like email addresses and credit card numbers is a helpful bonus. Using your web framework's validation provides the additional advantage of pushing the validation logic to the very edge of the web tier, causing invalid data to be rejected before it ever reaches complex application code where critical mistakes are easier to make.
Although this section focused on using input validation as a mechanism for protecting your form handling code, any code that handles input from an untrusted source can be validated in much the same way, whether the message is JSON, XML, or any other format, and regardless of whether it's a cookie, a header, or URL parameter string.
If it violates the contract, reject it! In addition to limiting data coming into an application, web application developers need to pay close attention to the data as it comes out. A modern web application usually has basic HTML markup for document structure, CSS for document style, JavaScript for application logic, and user-generated content which can be any of these things. And it's often all rendered to the same document.
A Guide to Building Secure Web Applications Essential PHP Security explains the most common types of attacks and how to write code that isn't susceptible to. Editorial Reviews. Review. You've heard the nasty stories about PHP sites being wiped off the Essential PHP Security: A Guide to Building Secure Web Applications Kindle Edition. by.
The developer is always one errant angle bracket away from running in a very different execution context than they intend. This is further complicated when you have additional context-specific content embedded within an execution context. HTML is a very, very permissive format.
Browsers try their best to render the content, even if it is malformed. That may seem beneficial to the developer since a bad bracket doesn't just explode in an error, however, the rendering of badly formed markup is a major source of vulnerabilities. Attackers have the luxury of injecting content into your pages to break through execution contexts, without even having to worry about whether the page is valid.
Handling output correctly isn't strictly a security concern. Applications rendering data from sources like databases and upstream services need to ensure that the content doesn't break the application, but risk becomes particularly high when rendering content from an untrusted source. This is where output encoding comes in. Output encoding is converting outgoing data to a final output format.
The complication with output encoding is that you need a different codec depending on how the outgoing data is going to be consumed. Without appropriate output encoding, an application could provide its client with misformatted data making it unusable, or even worse, dangerous. An attacker who stumbles across insufficient or inappropriate encoding knows that they have a potential vulnerability that might allow them to fundamentally alter the structure of the output from the intent of the developer. For example, imagine that one of the first customers of a system is the former supreme court judge Sandra Day O'Connor.
What happens if her name is rendered into HTML? All is right with the world. The page is generated as we would expect. These strings are going to show up in JavaScript, too. What happens when the page outputs this to the browser? The result is malformed JavaScript. This is what hackers look for to break through execution context and turn innocent data into dangerous executable code. If the Chief Justice enters her name as. If, however, we correctly encode the output for a JavaScript context, the text will look like this:.
A bit confusing, perhaps, but a perfectly harmless, non-executable string. Note There are a couple strategies for encoding JavaScript. The good news is that most modern web frameworks have mechanisms for rendering content safely and escaping reserved characters. The bad news is that most of these frameworks include a mechanism for circumventing this protection and developers often use them either due to ignorance or because they are relying on them to render executable code that they believe to be safe.
There are so many tools and frameworks these days, and so many encoding contexts e. If you are using another framework, check the documentation for safe output encoding functions. You need to be aware of the encoding a particular context the encoding tool is written for. This pattern will generally bite you later on. If you were to encode the text as HTML prior to storage, you can run into problems if you need to render the data in another format: This adds a great deal of complexity and encourages developers to write code in their application code to unescape the content, making all the tricky upstream output encoding effectively useless.
You are much better off storing the data in its most raw form, then handling encoding at rendering time. Finally, it's worth noting that nested rendering contexts add an enormous amount of complexity and should be avoided whenever possible. It's hard enough to get a single output string right, but when you are rendering a URL, in HTML within JavaScript, you have three contexts to worry about for a single string.
If you absolutely cannot avoid nested contexts, make sure to de-compose the problem into separate stages, thoroughly test each one, paying special attention to order of rendering. Output encode all application data on output with an appropriate codec Use your framework's output encoding capability, if available Avoid nested rendering contexts as much as possible Store your data in raw form and encode at rendering time Avoid unsafe framework and JavaScript calls that avoid encoding.
Whether you are writing SQL against a relational database, using an object-relational mapping framework, or querying a NoSQL database, you probably need to worry about how input data is used within your queries. The database is often the most crucial part of any web application since it contains state that can't be easily restored. It can contain crucial and sensitive customer information that must be protected. It is the data that drives the application and runs the business. So you would expect developers to take the most care when interacting with their database, and yet injection into the database tier continues to plague the modern web application even though it's relatively easy to prevent!
No discussion of parameter binding would be complete without including the famous "Little Bobby Tables" issue of xkcd:. To decompose this comic, imagine the system responsible for keeping track of grades has a function for adding new students:. The final "--" comments out the remainder of the original query, ensuring the SQL syntax is valid.
Et voila, the DROP is executed. This attack vector allows the user to execute arbitrary SQL within the context of the application's database user. In other words, the attacker can do anything the application can do and more, which could result in attacks that cause greater harm than a DROP, including violating data integrity, exposing sensitive information or inserting executable code.
Later we will talk about defining different users as a secondary defense against this kind of mistake, but for now, suffice to say that there is a very simple application-level strategy for minimizing injection risk. To quibble with Hacker Mom's solution, sanitizing is very difficult to get right, creates new potential attack vectors and is certainly not the right approach.
Your best, and arguably only decent option is parameter binding. Parameter binding provides a means of separating executable code, such as SQL, from content, transparently handling content encoding and escaping. Any full-featured data access layer will have the ability to bind variables and defer implementation to the underlying protocol.
This way, the developer doesn't need to understand the complexities that arise from mixing user input with executable code. For this to be effective all untrusted inputs need to be bound. If SQL is built through concatenation, interpolation, or formatting methods, none of the resulting string should be created from user input. Sometimes we encounter situations where there is tension between good security and clean code.
Security sometimes requires the programmer to add some complexity in order to protect the application. In this case however, we have one of those fortuitous situations where good security and good design are aligned. In addition to protecting the application from injection, introducing bound parameters improves comprehensibility by providing clear boundaries between code and content, and simplifies creating valid SQL by eliminating the need to manage the quotes by hand. As you introduce parameter binding to replace your string formatting or concatenation, you may also find opportunities to introduce generalized binding functions to the code, further enhancing code cleanliness and security.
This highlights another place where good design and good security overlap: There is a misconception that stored procedures prevent SQL injection, but that is only true insofar as parameters are bound inside the stored procedure. If the stored procedure itself does string concatenation it can be injectable as well, and binding the variable from the client won't save you. Similarly, object-relational mapping frameworks like ActiveRecord, Hibernate, or. NET Entity Framework, won't protect you unless you are using binding functions. If you are building your queries using untrusted input without binding, the app still could be vulnerable to an injection attack.
Finally, there is a misconception that NoSQL databases are not susceptible to injection attack and that is not true. All query languages, SQL or otherwise, require a clear separation between executable code and content so the execution doesn't confuse the command from the parameter. Attackers look for points in the runtime where they can break through those boundaries and use input data to change the intended execution path.
The bottom line is that you need to check the data store and driver documentation for safe ways to handle input data. Check the matrix below for indication of safe binding functions of your chosen data store. If it is not included in this list, check the product documentation. While we're on the subject of input and output, there's another important consideration: When using an ordinary HTTP connection, users are exposed to many risks arising from the fact data is transmitted in plaintext.
An attacker capable of intercepting network traffic anywhere between a user's browser and a server can eavesdrop or even tamper with the data completely undetected in a man-in-the-middle attack. There is no limit to what the attacker can do, including stealing the user's session or their personal information, injecting malicious code that will be executed by the browser in the context of the website, or altering data the user is sending to the server. We can't usually control the network our users choose to use. They might have unsuspectingly connected to a hostile wireless network with a name like "Free Wi-Fi" set up by an attacker in a public place.
They might be using an internet provider that injects content such as ads into their web traffic, or they might even be in a country where the government routinely surveils its citizens. If an attacker can eavesdrop on a user or tamper with web traffic, all bets are off. The data exchanged cannot be trusted by either side.
HTTPS was originally used mainly to secure sensitive web traffic such as financial transactions, but it is now common to see it used by default on many sites we use in our day to day lives such as social networking and search engines. When configured and used correctly, it provides protection against eavesdropping and tampering, along with a reasonable guarantee that a website is the one we intend to be using. Or, in more technical terms, it provides confidentiality and data integrity, along with authentication of the website's identity. With the many risks we all face, it increasingly makes sense to treat all network traffic as sensitive and encrypt it.
Several browser makers have announced their intent to deprecate non-secure HTTP and even display visual indications to users to warn them when a site is not using HTTPS. So why aren't we using it for everything now? For a long time, it was perceived as being too computationally expensive to use for all traffic, but with modern hardware that has not been the case for some time.
The cost of obtaining a certificate from a certificate authority also deterred adoption, but the introduction of free services like Let's Encrypt has eliminated that barrier. Today there are fewer hurdles than ever before. The ability to authenticate the identity of a website underpins the security of TLS. In the absence of the ability to verify that a site is who it says it is, an attacker capable of doing a man-in-the-middle attack could impersonate the site and undermine any other protection the protocol provides.
When using TLS, a site proves its identity using a public key certificate. This certificate contains information about the site along with a public key that is used to prove that the site is the owner of the certificate, which it does using a corresponding private key that only it knows. In some systems a client may also be required to use a certificate to prove its identity, although this is relatively rare in practice today due to complexities in managing certificates for clients. Unless the certificate for a site is known in advance, a client needs some way to verify that the certificate can be trusted.
This is done based on a model of trust. In web browsers and many other applications, a trusted third party called a Certificate Authority CA is relied upon to verify the identity of a site and sometimes of the organization that owns it, then grant a signed certificate to the site to certify it has been verified.
It isn't always necessary to involve a trusted third party if the certificate is known in advance by sharing it through some other channel. For example, a mobile app or other application might be distributed with a certificate or information about a custom CA that will be used to verify the identity of the site. This practice is referred to as certificate or public key pinning and is outside the scope of this article.
The most visible indicator of security that many web browsers display is when communications with a site are secured using HTTPS and the certificate is trusted. Without it, a browser will display a warning about the certificate and prevent a user from viewing your site, so it is important to get a certificate from a trusted CA. It is possible to generate your own certificate to test a HTTPS configuration out, but you will need a certificate signed by a trusted CA before exposing the service to users.
For many uses, a free CA is a good starting point. When searching for a CA, you will encounter different levels of certification offered. The most basic, Domain Validation DV , certifies the owner of the certificate controls a domain. Although the more advanced options result in a more positive visual indicator of security in the browser, it may not be worth the extra cost for many.
At first glance, this may seem like a task worthy of someone who holds a PhD in cryptography. You may want to choose a configuration that supports a wide range of browser versions, but you need to balance that with providing a high level of security and maintaining some level of performance. The cryptographic algorithms and protocol versions supported by a site have a strong impact on the level of communications security it provides. Attacks with impressive sounding names like FREAK and DROWN and POODLE admittedly, the last one doesn't sound all that formidable have shown us that supporting dated protocol versions and algorithms presents a risk of browsers being tricked into using the weakest option supported by a server, making attack much easier.
Advancements in computing power and our understanding of the mathematics underlying algorithms also renders them less safe over time. How can we balance staying up to date with making sure our website remains compatible for a broad assortment of users who might be using dated browsers that only support older protocol versions and algorithms? Fortunately, there are tools that help make the job of selection a lot easier.
Note that the configuration generator mentioned above enables a browser security feature called HSTS by default, which might cause problems until you're ready to commit to using HTTPS for all communications long term. We'll discuss HSTS a little later in this article. It is not uncommon to encounter a website where HTTPS is used to protect only some of the resources it serves.
In some cases the protection might only be extended to handling form submissions that are considered sensitive. Other times, it might only be used for resources that are considered sensitive, for example what a user might access after logging into the site. The trouble with this inconsistent approach is that anything that isn't served over HTTPS remains susceptible to the kinds of risks that were outlined earlier. For example, an attacker doing a man-in-the-middle attack could simply alter the form mentioned above to submit sensitive data over plaintext HTTP instead.
If the attacker injects executable code that will be executed in the context of our site, it isn't going to matter much that part of it is protected with HTTPS. Web browsers default to using HTTP when a user enters an address into their address bar without typing "https: As a result, simply shutting down the HTTP network port is rarely an option. For resources that will be accessed by web browsers, adopting a policy of redirecting all HTTP requests to those resources is the first step towards using HTTPS consistently. Not all API clients are able to handle redirects.
Enabling it is as simple as sending a header in a response:. The above header instructs the browser to only interact with the site using HTTPS for a period of six months specified in seconds.
HSTS is an important feature to enable due to the strict policy it enforces. It also instructs the browser to disallow the user from bypassing the warning it displays if an invalid certificate is encountered when loading the site. In addition to requiring little effort to enable in the browser, enabling HSTS on the server side can require as little as a single line of configuration. For example, in Apache it is enabled by adding a Header directive within the VirtualHost configuration for port Now that you have an understanding of some of the risks inherent to ordinary HTTP, you might be scratching your head wondering what happens when the first request to a website is made over HTTP before HSTS can be enabled.
To address this risk some browsers allow websites to be added to a "HSTS Preload List" that is included with the browsers. Once included in this list it will no longer be possible for the website to be accessed using HTTP, even on the first time a browser is interacting with the site. Before deciding to enable HSTS, some potential challenges must first be considered. We don't always have control over how content can be loaded from external systems, for example from an ad network.
This might require us to work with the owner of the external system to adopt HTTPS, or it might even involve temporarily setting up a proxy to serve the external content to our users over HTTPS until the external systems are updated. Once HSTS is enabled, it cannot be disabled until the period specified in the header elapses. The decision to add your website to the Preload List is not one that should be taken lightly.
Unfortunately, not all browsers in use today support HSTS. It can not yet be counted on as a guaranteed way to enforce a strict policy for all users, so it is important to continue to redirect users from HTTP to HTTPS and employ the other protections mentioned in this article. Browsers have a built-in security feature to help avoid disclosure of a cookie containing sensitive information. Setting the "secure" flag in a cookie will instruct a browser to only send a cookie when using HTTPS.
This is an important safeguard to make use of even when HSTS is enabled. There are some other risks to be mindful of that can result in accidental disclosure of sensitive information despite using HTTPS. It is dangerous to put sensitive data inside of a URL. Doing so presents a risk if the URL is cached in browser history, not to mention if it is recorded in logs on the server side. In addition, if the resource at the URL contains a link to an external site and the user clicks through, the sensitive data will be disclosed in the Referer header.
In addition, sensitive data might still be cached in the client, or by intermediate proxies if the client's browser is configured to use them and allow them to inspect HTTPS traffic. For ordinary users the contents of traffic will not be visible to a proxy, but a practice we've seen often for enterprises is to install a custom CA on their employees' systems so their threat mitigation and compliance systems can monitor traffic.
Consider using headers to disable caching to reduce the risk of leaking data due to caching. As a last step, you should verify your configuration. There is a helpful online tool for that, too.
MySQL in a Nutshell. It can not yet be counted on as a guaranteed way to enforce a strict policy for all users, so it is important to continue to redirect users from HTTP to HTTPS and employ the other protections mentioned in this article. It's hard enough to get a single output string right, but when you are rendering a URL, in HTML within JavaScript, you have three contexts to worry about for a single string. If the attacker injects executable code that will be executed in the context of our site, it isn't going to matter much that part of it is protected with HTTPS. Consider using headers to disable caching to reduce the risk of leaking data due to caching.
Since the tool is updated as new attacks are discovered and protocol updates are made, it is a good idea to run this every few months. When developing applications, you need to do more than protect your assets from attackers. You often need to protect your users from attackers, and even from themselves. The most obvious way to write password-authentication is to store username and password in table and do look ups against it. Don't ever do this:. Will it allow valid users in and keep unregistered users out?
But here's why it's a very, very bad idea:. Insecure password storage creates risks from both insiders and outsiders. One often overlooked risk is that your insiders can now impersonate your users within your application. We might hope it's otherwise, but the fact is that users reuse credentials. The first time someone signs up for your site of captioned cat pictures using the same email address and password that they use for their bank login, your seemingly low-risk credentials database has become a vehicle for storing financial credentials.
If a rogue employee or an external hacker steals your credentials data, they can use them for attempted logins to major bank sites until they find the one person who made the mistake of using their credentials with wackycatcaptions. If you went down the path of creating logins for your site, option two is probably not available to you, so you are probably stuck with option one.
So what is involved in safely storing credentials? Firstly, you never want to store the password itself, but rather store a hash of the password. A cryptographic hashing algorithm is a one-way transformation from an input to an output from which the original input is, for all practical purposes, impossible to recover. More on that "practical purposes" phrase shortly. For example, your password might be "littlegreenjedi".
Applying Argon2 with the salt "" more on salts later and default command-line options, gives you the the hex result 9be7ddf91b7fd0dbbd5afd4ac58cae11d5fba1. Now you aren't storing the password at all, but rather this hash.
In order to validate a user's password, you just apply the same hash algorithm to the password text they send, and, if they match, you know the password is valid. So we're done, right? The problem now is that, assuming we don't vary the salt, every user with the password "littlegreenjedi" will have the same hash in our database. Many people just re-use their same old password. Lookup tables generated using the most commonly occurring passwords and their variations can be used to efficiently reverse engineer hashed passwords.
If an attacker gets hold of your password store, they can simply cross-reference a lookup table with your password hashes and are statistically likely to extract a lot of credentials in a pretty short period of time. The trick is to add a bit of unpredictability into the password hashes so they cannot be easily reverse engineered. A salt, when properly generated, can provide just that. A salt is some extra data that is added to the password before it is hashed so that two instances of a given password do not have the same hash value. The real benefit here is that it increases the range of possible hashes of a given password beyond the point where it is practical to pre-compute them.
Suddenly the hash of "littlegreenjedi" can't be predicted anymore. Now, if an attacker gets their hands on the password hash store, it is much more expensive to brute force the passwords. The salt doesn't require any special protection like encryption or obfuscation. It can live alongside the hash, or even encoded with it, as is the case with bcrypt. If your password table or file falls into attacker hands access to the salt won't help them use a lookup table to mount an attack on the collection of hashes. A salt should be globally unique per user. A UUID will certainly work and although probably overkill, it's generally easy to generate, if costly to store.
Hashing and salting is a good start, but as we will see below, even this might not be enough. Sadly, all hashing algorithms are not created equal. SHA-1 and MD5 had been common standards for a long time until the discovery of a low cost collision attack. Luckily there are plenty of alternatives that are low-collision, and slow.
A slower algorithm means that a brute force attack is more time consuming and therefore costlier to run. The best widely-available algorithms are now considered to be scrypt and bcrypt. Bindings are available for several languages. Argon2 was designed specifically for the purpose of hashing passwords and is resistant to attacks using GPUs and other specialized hardware.
However, it is very new and has not yet been broadly adopted, although signs are good that it will be soon. Pay attention to how this adoption occurs, and when implementations become more widely available. When we feel comfortable recommending adoption, we'll update this evolving publication. In addition to choosing an appropriate algorithm, you want to make sure you have it configured correctly. Key derivation functions have configurable iteration counts, also known as work factor , so that as hardware gets faster, you can increase the time it takes to brute force them.
If you want to make your application a bit more future-proof, you can add the configuration parameters in the password storage, too, along with the hash and salt. That way, if you decide to increase the work factor, you can do so without breaking existing users or having to do a migration in one shot. By including the name of the algorithm in storage, too, you could even support more than one at the same time allowing you to evolve away from algorithms as they are deprecated in favor of stronger ones.
Really the only change to the code above is that rather than storing the password in clear text, you are storing the salt, the hash, and the work factor. That means when a user first chooses a password, you will want to generate a salt and hash the password with it. Then, during a login attempt, you will use the salt again to generate a hash to compare with the stored hash. The example above uses the python bcrypt library, which stores the salt and the work factor in the hash for you.
If you print out the results of hashpw , you can see them embedded in the string. Not all libraries work this way. Some output a raw hash, without salt and work factor, requiring you to store them in addition to the hash. But the result is the same: This might be obvious, but all the advice above is only for situations where you are storing passwords for a service that you control.
If you are storing passwords on behalf of the user to access another system, your job is considerably more difficult. Your best bet is to just not do it since you have no choice but to store the password itself, rather than a hash. Ideally the third party will be able to support a much more appropriate mechanism like SAML, OAuth or a similar mechanism for this situation.
If not, you need to think through very carefully how you store it, where you store it and who has access to it. It's a very complicated threat model, and hard to get right. Many sites create unreasonable limits on how long your password can be. Even if you hash and salt correctly, if your password length limit is too small, or the allowed character set too narrow, you substantially reduce the number of possible passwords and increase the probability that the password can be brute forced.
The goal, in the end, is not length, but entropy, but since you can't effectively enforce how your users generate their passwords, the following would leave in pretty good stead:. If your security requirements are very stringent then you may want to think beyond password strategy and look to mechanisms like two-factor authentication so you aren't over-reliant on passwords for security. Both NIST and Wikipedia have very detailed explanations of the effects of character length and set limits on entropy. If you are resources constrained, you can get quite specific about the cost of breaking into your systems based on speed of GPU clusters and keyspace, but for most of situations, this level of specificity just isn't necessary to find an appropriate password strategy.
If we need to know the identity of our users, for example to control who receives specific content, we need to provide some form of authentication. If we want to retain information about a user between requests once they have authenticated, we will also need to support session management. Despite being well-known and supported by many full-featured frameworks, these two concerns are implemented incorrectly often enough that they have earned spot 2 in the OWASP Top Authentication is sometimes confused with authorization.
Authentication confirms that a user is who they claim to be. For example, when you log into your bank, your bank can verify it is in fact you and not an attacker trying to steal the fortune you amassed selling your captioned cat pictures site. Authorization defines whether a user is allowed to do something. Your bank may use authorization to allow you to see your overdraft limit, but not allow you to change it. Session management ties authentication and authorization together. Session management makes it possible to relate requests made by a particular user.
Without session management, users would have to authenticate during each request they sent to a web application. All three elements - authentication, authorization, and session management - apply to both human users and to services. Keeping these three separate in our software reduces complexity and therefore risk. There are many methods of performing authentication. Regardless of which method you choose, it is always wise to try to find an existing, mature framework that provides the capabilities you need.
Such frameworks have often been scrutinized over a long period of time and avoid many common mistakes. Helpfully, they often come with other useful features as well. An overarching concern to consider from the start is how to ensure credentials remain private when a client sends them across the network. The easiest, and arguably only, way to achieve this is to follow our earlier advice to use HTTPS for everything. One option is to use the simple challenge-response mechanism specified in the HTTP protocol for a client to authenticate to a server.
When your browser encounters a Unauthorized response that includes information about a challenge to access the resource, it will popup a window prompting you to enter your name and password, keeping them in memory for subsequent requests. This mechanism has some weaknesses, the most serious of which being that the only way for a user to logout is by closing their browser. A safer option that allows you to manage the lifecycle of a user's session after authenticating is by simply entering credentials through a web form. This can be as simple as looking up a username in a database table and comparing the hash of a password using an approach we outlined in our earlier section on hashing passwords.
For example, using Devise, a popular framework for Ruby on Rails, this can be done by registering a module for password authentication in the model used to represent a User, and instructing the framework to authenticate users before requests are processed by controllers. We can rely on external service providers where users may already have accounts to identify them. We can also authenticate users using a variety of different factors: Depending on your needs, some of these options may be worth considering, while others are helpful when we want to add an extra layer of protection.
One option that offers a convenience for many users is to allow them to log in using their existing account on popular services such as Facebook, Google, and Twitter, using a service called Single Sign-On SSO. SSO allows users to log in to different systems using a single identity managed by an identity provider.
To achieve this, SSO relies on the external service to manage logging the user in and to confirm their identity. The user never provides any credentials to our site. SSO can significantly reduce the amount of time it takes to sign up for a site and eliminates the need for users to remember yet another username and password. However, some users may prefer to keep their use of our site private and not connect it to their identity elsewhere.
Others may not have an existing account with the external providers we support. It is always preferable to allow users to register by manually entering their information as well. A single factor of authentication such as a username and password is sometimes not enough to keep users safe. Using other factors of authentication can add an additional layer of security to protect users in the event a password is compromised. With Two-Factor Authentication 2FA , a second, different factor of authentication is required to confirm the identity of a user.
If something the user knows, such as a username and password, is used as the first factor of authentication, a second factor could be something the user has, such as a secret code generated using software on their mobile phone or by a hardware token. Verifying a secret code sent to a user via SMS text message was once a popular way of doing this, but it is now deprecated due to presenting various risks. Applications like Google Authenticator and a multitude of other products and services can be safer and are relatively easy to implement, although any option will increase complexity of an application and should be considered mainly when applications maintain sensitive data.
We can also use it to provide additional protection when users perform sensitive actions such as changing their password or transferring money. For example, some online merchants require you to re-enter details from your credit card when making a purchase to a newly-added shipping address. It is also helpful to require users to re-enter their passwords when updating their personal information. When a user makes a mistake entering their username or password, we might see a website respond with a message like this: The user ID is unknown.
Revealing whether a user exists can help an attacker enumerate accounts on our system to mount further attacks against them or, depending on the nature of the site, revealing the user has an account may compromise their privacy. A better, more generic, response might be: Incorrect user ID or password. Users can be enumerated through many other functions of a web application, for example when signing up for an account or resetting their password.
It is good to be mindful of this risk and avoid disclosing unnecessary information. One alternative is to send an email with a link to continue their registration or a password reset link to a user after they enter their email address, instead of outputting a message indicating whether the account exists. An attacker might try to conduct a brute force attack to guess account passwords until they find one that works. With attackers increasingly using large networks of compromised systems referred to as botnets to conduct attacks with, finding an effective solution to protect against this while not impacting service continuity is a challenging task.