HTTP Feature Engineering

Analyze An Example HTTP GET Request

From this HTTP traffic data packet, we can extract the following features:

  1. Source IP Address: 127.0.0.1
  2. Source Port: 38998
  3. Destination IP Address: 127.0.0.1
  4. Destination Port: 8000
  5. Timestamp: 3.836109 (This is a relative timestamp, indicating that the packet was captured 3.836109 seconds after the start of the capture)

Additionally, more HTTP layer information can be extracted, such as:

  1. HTTP Request Line:
    • Method: GET
    • Request URI: /vulnerabilities/brute/?username=admin&password=admin&Login=Login
    • HTTP Version: HTTP/1.1
  2. HTTP Headers:
    • Host: 127.0.0.1:8000
    • Cookie: PHPSESSID=1f913547813995a657325d1d6f796132; security=medium
    • User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/115.0
    • Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8
    • Accept-Language: en-US,en;q=0.5
    • Accept-Encoding: gzip, deflate, br
    • Connection: close
    • Referer: http://127.0.0.1:8000/vulnerabilities/brute/
    • Upgrade-Insecure-Requests: 1
    • Sec-Fetch-Dest: document
    • Sec-Fetch-Mode: navigate
    • Sec-Fetch-Site: same-origin
    • Sec-Fetch-User: ?1

Feature Engineering: HTTP Request

An overview and examples of these features:

Basic Network Layer Features

  1. Source IP Address
  2. Source Port
  3. Destination IP Address
  4. Destination Port
  5. Timestamp

HTTP Request Layer Features

  1. HTTP Request Line
    • Request Method: Such as GETPOST, etc.
    • Request URI: The path to the requested resource
    • HTTP Version: Such as HTTP/1.1HTTP/2, etc.
  2. HTTP Header Fields
    • Host: The domain name or IP address of the server
    • User-Agent: Information about the client software (usually browser information)
    • Accept: MIME types that the client can handle
    • Accept-Language: Languages that the client can handle
    • Accept-Encoding: Encoding methods that the client can handle (e.g., gzipdeflate, etc.)
    • Connection: Connection management information (e.g., keep-aliveclose, etc.)
    • Referer: URL of the referring page
    • Cookie: Cookie information sent by the client
    • Content-Type (POST requests only): Format of the data in the request body (e.g., application/x-www-form-urlencodedmultipart/form-dataapplication/json, etc.)
    • Content-Length (POST requests only): Length of the request body
  3. HTTP Request Body (POST requests only)
    • Request body data: Such as form data, JSON data, etc.

Examples

GET Request

GET /example/path?query=param HTTP/1.1
Host: www.example.com
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate, br
Connection: keep-alive
Referer: http://www.example.com/previous-page
Cookie: sessionId=abc123

POST Request

POST /submit/form HTTP/1.1
Host: www.example.com
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36
Accept: application/json
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate, br
Connection: keep-alive
Content-Type: application/x-www-form-urlencoded
Content-Length: 27
Cookie: sessionId=abc123

username=admin&password=1234

Conclusion

  • Source IP Address: The IP address of the client
  • Source Port: The port number of the client
  • Destination IP Address: The IP address of the server
  • Destination Port: The port number of the server
  • Timestamp: The time of the request
  • Request Method: GET or POST
  • Request URI: Such as /example/path?query=param or /submit/form
  • HTTP Version: Such as HTTP/1.1
  • Header Fields: Such as Host, User-Agent, Accept, Accept-Language, Accept-Encoding, Connection, Referer, Cookie, etc.
  • Request Body (POST requests only): Such as username=admin&password=1234

Feature Engineering: Header

In the HTTP protocol, header fields are used to describe the metadata of a request or response. HTTP header fields come in many varieties, with each providing different information. The specific subfields of HTTP header fields are numerous and can be categorized as follows:

Common HTTP Request Header Fields

  • Host: Specifies the target host of the request, including the port number
    (e.g., Host: www.example.com:80).
  • User-Agent: Identifies the type of client software (e.g., browser) making the request
    (e.g., User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36).
  • Accept: Indicates the MIME types the client can handle
    (e.g., Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8).
  • Accept-Language: Specifies the natural languages the client can handle
    (e.g., Accept-Language: en-US,en;q=0.5).
  • Accept-Encoding: Indicates the content encodings the client can handle
    (e.g., Accept-Encoding: gzip, deflate, br).
  • Connection: Controls the management of the connection
    (e.g., Connection: keep-alive or Connection: close).
  • Referer: Specifies the URL of the referring page
    (e.g., Referer: <http://www.example.com/previous-page>).
  • Cookie: Contains all the cookie information sent by the client
    (e.g., Cookie: sessionId=abc123).
  • Content-Type: Indicates the MIME type of the request body
    (e.g., Content-Type: application/x-www-form-urlencoded or Content-Type: application/json).
  • Content-Length: Indicates the byte length of the request body
    (e.g., Content-Length: 27).
  • Authorization: Contains credentials for authentication
    (e.g., Authorization: Basic YWxhZGRpbjpvcGVuc2VzYW1l).
  • Cache-Control: Indicates the caching mechanism for requests and responses
    (e.g., Cache-Control: no-cache).

Common HTTP Response Header Fields

  • Content-Type: Indicates the MIME type of the response body (e.g., Content-Type: text/html; charset=UTF-8).
  • Content-Length: Indicates the byte length of the response body (e.g., Content-Length: 348).
  • Set-Cookie: Sets cookie information on the client (e.g., Set-Cookie: sessionId=abc123; Path=/; HttpOnly).
  • Cache-Control: Indicates the caching mechanism for the response (e.g., Cache-Control: no-cache, no-store, must-revalidate).
  • Expires: Indicates the date and time when the response expires (e.g., Expires: Wed, 21 Oct 2015 07:28:00 GMT).
  • Last-Modified: Indicates the date and time the resource was last modified (e.g., Last-Modified: Tue, 15 Nov 1994 12:45:26 GMT).
  • ETag: Indicates the entity tag of the resource, used for cache validation (e.g., ETag: "686897696a7c876b7e").
  • Server: Indicates information about the server software (e.g., Server: Apache/2.4.1 (Unix)).
  • Location: Indicates the URL to which the client should be redirected (e.g., Location: <http://www.example.com/>).
  • Content-Encoding: Indicates the encoding of the response body (e.g., Content-Encoding: gzip).

Official Documentation

Detailed explanations of HTTP header fields can be found in official documentation, including:

  1. IETF RFC Documents: The official standard for HTTP/1.1 header fields is found in RFC 2616, but this document has been superseded by RFC 7230 to RFC 7235. The standard for HTTP/2 is found in RFC 7540.
  2. MDN Web Docs: Mozilla’s MDN Web Docs provides detailed and easy-to-understand documentation on HTTP header fields, available on the HTTP headers page.

Cheat Sheet

  • Source IP Address
  • Source Port
  • Destination IP Address
  • Destination Port
  • Timestamp
  • Request Method
  • Request URI
  • HTTP Version
  • Header Fields
    • Host
    • User-Agent
    • Accept
    • Accept-Language
    • Accept-Encoding
    • Connection
    • Referer
    • Cookie
    • Content-Type
    • Content-Length
    • Authorization
    • Cache-Control
  • Request Body (POST requests only)