Encoding & Filtering
Filtering with Regex, Types of encoding, Bypass WAF and More
Last updated
Filtering with Regex, Types of encoding, Bypass WAF and More
Last updated
This table is a character encoding chart that is useful in explaining which characters are “safe” and which characters should be encoded in URLs.
Some commonly encoded characters are:
’#’
Separate anchors
%23
’?’
Separate query string
%3F
‘&’
Separate query elements
%24
’%’
Indicates an encoded character
%25
’/’
Separate domain and directories
%2F
’+’
Indicates a space
%2B
‘'
Not recommended
%20 or +
Documents transmitted via HTTP can send a charset parameter in the header to specify the character encoding of the document sent. This is the HTTP header: Content-Type
Define character encoding using HTTP
HTML4 and HTML5 specifications about special characters
Characters references must start with a U+0026 AMPERSAND character (&):
Named entity
& + named character references + ;
&It;
Numeric Decimal
& + # + + ;
<
Numeric Hexadecimal
& + #x + + ;
< / <
Some Variations:
Numeric Decimal
No terminator (;)
<
One or more zeroes before code
< / <
Numeric Hexadecimal
No terminator (;)
<
One or more zeroes before code
�x3c / c
Base 36 Encoding Scheme
Base36 - Its the most compact, case-insensitive, alphanumerical system using ASCII characters. In fact, the schemes alphabet contains all digits [0-9] and Latin letters [A-Z]
Its used in many real-world scenarios
Reddit used if For identifying both posts and comments
Some URL shortening services like TinyURL use Base36 integer as compact, alphanumeric identifiers.
PHP
JavaScript JavaScript used two functions:
Base64 Encoding Scheme
Base64 is one of the most widespread binary-to-text encoding schemes to date. It was designed to allow binary data to be represented as ASCII string text.
The alphabet of the Base64 encoding scheme is composed of digits [0-9] and Latin letters, both upper and lower case [a-zA-Z], For a total of 62 values. To complete the character set to 64 there are the plus (+) and slash (/) characters.
The algorithm divides the message into groups of 6 bits* and then converts each group, with the respective ASCII character, following the conversion table.
Thats why the allowed characters are 64 (2 raised to 6th power = 64)
If the lastest gruop is null(000000) the respective encoding value is =
If the traiing null groups are two, then will be encoded as ==
PHP PHP used base64_encode and base64_decode functions based on MIME Base64 implementation:
JavaScript Many browsers can handle base64 natively through function btoa and atob:
Unicode aka ISO/IEC 10646 Universal Character Set. It can expose web applications to possible security attacks, like bypass filters.
UTF = Unicode Transformation Format:
In typography, a Homoglyph is one or two or more characters, or glyphs, with shapes that either appear identical or cannot be differentiated by quick visual inspection. -Wikipedia
They can bypass anti cross-site scripting and SQL Injection filters;
There are other ways in which characters and strings can be transformed by software processes, such as normalization, canonicalization, best fit mapping, etc
Its common to abuse multiple encodings to bypass security measures
A common, yet often recommended, best practice to protect web applications against malicious attacks is the use of specific input filtering and output encoding controls.
These kings of controls may range from naive blacklists to experienced and higly restrictive whitelists. What about in the real world? We are somewhere in the middle!
Control can be implemented at different layers in a web application. They can be represented as either libraries and APIs, or in the best case, by internal specialits or external organizations, like ESAPI by OWASP.
Security Controls are also inside most common browsers.
Generally, these solutions fall into the IDS and IPS world, but FOr Web Applications, the most chosen are the Web Application Firewall (WAFs)
Represents the official way used to define the filter rules. Mastering RegEx is fundamental to understand how to bypass filters because RE are extremely powerful.
Its a special sequence of characters used For describing a search pattern.
→ regular expression = regex
→ pattern matched = match
DFA
awk, egrep, MySQL, Procmail
NFA
.NET languages, Java, Perl, PHP, Python, Ruby, PCRE library, vi, grep, less, more
Non-printing characters:
Match Unicode Code Point:
example:
Meta-sequence Quality:
Match Unicode Category:
ByPass WAFs
Citrix Netscaler uses some different cookies in the HTTP responses like ns_af or citrix_ns_id or NSC_
F5 BIG-IP ASM (Application Security Manager) uses cookies starting with TS and followed with a string that respect the following regex:
Barracura uses two cookies barra_counter_session and BNI__BARRACUDA_LB_COOKIE
Some WAFs rewrite the HTTP headers. Usually these modify the Server Header to deceive the attackers.
Some WAFs modify the HTTP response codes if the request is hostile; For example:
its also possible to detect in the response body
Example:
…Mod_Security…
…AQTRONIX WebKnight …
</body>
</body
dotDefender Blocked your Request
its useful in dropping the connection in the case the WAF detects a malicious request
mod_security
wafw00f is a tool written in python that can detect up to 20 different WAF products
The techniques used to detect a WAF are similar to those we have seen previously:
Cookies
Server Cloaking
Response Codes
Drop Action
Pre-Built-in Rules
→ nmap –script=http-waf-fingerprint -p 80
Browsers are the primary mean used to address client-side attacks
NoScript Security Suite is a whitelist-based security tool that basically disables all the executable web content (Javascript, Java, Flash, Silverlight, …) and lets the user choose which sites are trusted, thus allowing the use of these technologies.
c:\windows\system32\mshtml.dll library. Ways to inspect:
Basically, once a malicious injection is detected, the XSS Filter modified the evil part of the payload by adding the ‘#’ character in place of the neuter chracter, defined in the rules.
Web sites that chose to opt-out of this protection can use the HTTP response header:
enabled by default in browsers such as: chrome, opera and safari
THe filter analyzes both the inbound requests and the outbound. If, in the parsed HTML data, it finds executable code within the response, then it stops the script and generates a console alert similar to the following. The XSS Auditor refused to execute a script in …
however there is a lot of bypasses aswell
Named characters references →
→
Moreover:
Moreover:
Moreover Confusables:
Homoglyph Attack Generator:
Article about Homoglyph and Punycode attacks:
Create usernames and Spotify account hijacking:
→ Moreover:
A free and open source solution:
DFA =
NFA =
Comparison of regular expression engines:
Regular Expression Flavor Comparison:
wafw00f -
Nmap contains a script that tries to detect the presence of a web application fireall, its type and version.
imperva-detect =
→
is effect browser-based solutions to prevent targeted malicious web attacks.