Understanding Hashing & Encoding for Developers
A practical guide explaining the critical differences between hashing and encoding, common misconceptions, security implications, and when to use each technique.
Hashing vs Encoding: The Core Difference
One-way transformation
- ✓ Irreversible (cannot get original data back)
- ✓ Always produces same output for same input
- ✓ Fixed length output regardless of input size
- ✓ Used for: passwords, data integrity, digital signatures
Example: password123 → ef92b778... (SHA-256)
Two-way transformation
- ✓ Reversible (decode to get original data)
- ✓ No security properties (not for secrets)
- ✓ Variable length output based on input
- ✓ Used for: data transport, URL safety, binary-to-text
Example: Hello → SGVsbG8= (Base64) → Hello
Hashing Algorithms Explained
MD5 (Message Digest 5)
- Output: 128-bit hash (32 hexadecimal characters)
- Status: ❌ Cryptographically broken
- Speed: Very fast
- Use cases: Only for non-security purposes like checksums or cache keys
// MD5 Example
Input: "password123"
Output: "482c811da5d5b4bc6d497ffa98491e38"
Input: "The quick brown fox jumps over the lazy dog"
Output: "9e107d9d372bb6826bd81d3542a419d6"When MD5 is acceptable: File integrity checks (comparing file downloads), cache keys, non-security related unique identifiers. If attackers controlling the input is not a concern, MD5's speed can be useful.
SHA-256 (Secure Hash Algorithm 256-bit)
- Output: 256-bit hash (64 hexadecimal characters)
- Status: ✓ Secure
- Speed: Fast (hardware-accelerated on modern CPUs)
- Use cases: Password hashing (with salt), blockchain, certificates, data integrity
// SHA-256 Example
Input: "password123"
Output: "ef92b778bafe771e89245b89ecbc08a44a4e166c06659911881f383d4473e94f"
Input: "The quick brown fox jumps over the lazy dog"
Output: "d7a8fbb307d7809469ca9abcb0082e4f8d5651e46d3cdb762d02d0bf37c9e592"Why SHA-256 is preferred: Part of the SHA-2 family, widely adopted in TLS certificates, Git commits, Bitcoin mining. No known practical attacks. Good balance of security and performance.
SHA-512 (Secure Hash Algorithm 512-bit)
- Output: 512-bit hash (128 hexadecimal characters)
- Status: ✓ Secure
- Speed: Faster than SHA-256 on 64-bit systems
- Use cases: High-security applications, long-term data integrity, digital signatures
// SHA-512 Example
Input: "password123"
Output: "b109f3bbbc244eb82441917ed06d618b9008dd09b3befd1b5e07394c706a8bb980b1d7785e5976ec049b46df5f1326af5a2ea6d103fd07c95385ffab0cacbc86"
Input: "The quick brown fox jumps over the lazy dog"
Output: "07e547d9586f6a73f73fbac0435ed76951218fb7d0c8d788a309d785436bbb642e93a252a954f23912547d1e8a3b5ed6e1bfd7097821233fa0538f3db854fee6"SHA-512 benefits: Larger hash space makes collision attacks even more impractical. Optimized for 64-bit processors (actually faster than SHA-256 on modern servers). Used in high-security environments and when future-proofing is important.
Password Hashing Best Practices
Why Plain SHA-256 Is Not Enough for Passwords
- Too fast: SHA-256 can compute billions of hashes per second on modern GPUs, making brute-force attacks feasible
- Rainbow tables: Pre-computed hash tables can crack unsalted passwords instantly
- No salt management: Without proper salting, identical passwords have identical hashes
Proper Password Hashing (2024 Standards)
// ❌ WRONG - Never do this!
const hash = sha256(password)
// ✓ CORRECT - Use bcrypt or Argon2
import bcrypt from 'bcrypt'
const hash = await bcrypt.hash(password, 10)
// ✓ CORRECT - Use Argon2 (OWASP recommended)
import argon2 from 'argon2'
const hash = await argon2.hash(password)
// Both include automatic salting and configurable work factorsWhen SHA-256/SHA-512 ARE Appropriate
- File integrity verification: Checksums for downloads (combined with signatures)
- Git commits: Unique identifiers for commits (Git uses SHA-1, moving to SHA-256)
- API request signatures: HMAC-SHA256 for verifying API requests (e.g., AWS signatures)
- Digital signatures: Part of RSA/ECDSA signature schemes
- Blockchain: Proof-of-work (Bitcoin uses double SHA-256)
Encoding Explained
Base64 Encoding
Converts binary data to ASCII text using 64 characters (A-Z, a-z, 0-9, +, /).Not secure - anyone can decode it.
Input: "Hello World"
Output: "SGVsbG8gV29ybGQ="
// Common use cases:
- Email attachments (MIME encoding)
- Embedding images in HTML/CSS (data URIs)
- JWT tokens (header and payload are Base64-encoded)
- Storing binary data in JSON/XMLURL Encoding (Percent Encoding)
Converts special characters to %XX format for safe transmission in URLs. Spaces become %20, ampersands become %26, etc.
Input: "Hello World!"
Output: "Hello%20World%21"
Input: "user@example.com"
Output: "user%40example.com"
// Query parameters
Original: ?search=hello world&filter=active
Encoded: ?search=hello%20world&filter=activeWhy it matters: URLs can only contain certain characters. Special characters like spaces, ampersands, and question marks have specific meanings in URLs and must be encoded to avoid breaking the URL structure.
HTML Entity Encoding
Converts special HTML characters to entities to prevent XSS attacks and display reserved characters.
Input: "<script>alert('XSS')</script>"
Output: "<script>alert('XSS')</script>"
Common entities:
< → <
> → >
& → &
" → "
' → 'Security critical: Always HTML-encode user input before displaying it in web pages. This prevents cross-site scripting (XSS) attacks where attackers inject malicious scripts.
Real-World Security Mistakes
- Using MD5 for passwords - Attackers can crack MD5 hashes in seconds using rainbow tables or GPU brute-force.
- Using Base64 to "hide" API keys - Base64 is encoding, not encryption. Any developer can decode it instantly.
- Not salting password hashes - Same passwords produce same hashes, making bulk attacks effective.
- Forgetting URL encoding in query parameters - Leads to broken URLs and potential security issues (injection attacks).
- Not HTML-encoding user input - Direct path to XSS vulnerabilities.
Decision Matrix: What to Use When
| Scenario | Technique | Algorithm |
|---|---|---|
| Storing user passwords | Hashing | bcrypt, Argon2, scrypt |
| Verifying file integrity | Hashing | SHA-256, SHA-512 |
| API request signatures | Hashing | HMAC-SHA256 |
| Sending binary data in JSON | Encoding | Base64 |
| URL query parameters | Encoding | URL encoding (percent encoding) |
| Displaying user input in HTML | Encoding | HTML entity encoding |
| Storing sensitive data | Encryption | AES-256, RSA (not hashing or encoding!) |
Try Our Tools
Key Takeaways
- Hashing is one-way and irreversible - Use for passwords (with bcrypt/Argon2), data integrity, signatures
- Encoding is two-way and reversible - Use for data transport, not security
- MD5 is broken for security - Only use for checksums in non-adversarial contexts
- SHA-256/SHA-512 are secure - But use specialized algorithms (bcrypt, Argon2) for passwords
- Base64 is not encryption - Anyone can decode it; don't use for secrets
- Always encode user input - HTML encoding prevents XSS, URL encoding prevents broken links
Understanding these fundamentals prevents common security vulnerabilities. When in doubt: hash for integrity and authentication, encode for transport and display, encrypt for confidentiality.