Skip to main content

 

Motivation

 

A customer recently approached me with a challenge: they wanted to feed downstream systems with ordered and banded firewall data. Specifically, they had the following use cases:

  • Identify Top-N firewalls based on configuration complexity and other attributes like greatest number of permissive or shadow rules.
  • Flag percentile-based thresholds for various attributes (e.g., top 10%, bottom 25%)
  • Enable dynamic device selection for exhaustive, compute-heavy rule analysis 

The challenge?  NQE is designed for efficient set-based analysis, therefore it doesn’t require sorting or ranking functions in general.  Nevertheless, this customer’s use case called for an approach to simulate these behaviors without external processing.

So instead of relying on a built-in sort function, we can create a lightweight, math-driven approach that simulates ranking behavior. It’s fast, deterministic, and works entirely inside the NQE layer so that no external processing or comparisons are required.

In this example, we keep it simple by representing “complexity” using only the rule count for each firewall. But in practice, this technique can support a more sophisticated composite complexity score, incorporating things like:

  • Number of rules
  • Zones and services
  • Address object depth
  • Rule tags or match conditions
  • Use of nested structures
  • Rule usage (hits, creation and modification dates, etc)
  • Existence of overly permissive objects or shadows

The key is to produce a normalized rank-like number from the firewall rule count that can power visualizations, alerts, filtering, and banding without sorting functionality.

 

The Approach

 

To generate a ranking output without sorting:

  1. Normalize values using min-max scaling
  2. Add a fractional hash-based offset to break ties
  3. Use a configurable bin spread factor to reduce collisions
  4. Assign each value to a stable, deterministic integer "bin"

The resulting value we will call normalized, which behaves similarly to a rank, derived entirely through simple math.

Note: The test data shown below is ordered by RuleCount for readability and demonstration purposes. However, the technique is explicitly designed to work on unordered input, and you’re encouraged to test it with randomly ordered or real-world datasets — that’s the intended use case.

 

The Code

 

RuleCount = a
{ RuleCount: 9901, FwName: "fw-dmz-h1" },
{ RuleCount: 7884, FwName: "fw-dmz-h2" },
{ RuleCount: 7012, FwName: "fw-dmz-h3" },
{ RuleCount: 6150, FwName: "fw-core-a1" },
{ RuleCount: 6070, FwName: "fw-core-a2" },
{ RuleCount: 5951, FwName: "fw-core-a3" },
{ RuleCount: 5223, FwName: "fw-dist-b1" },
{ RuleCount: 2481, FwName: "fw-dist-b2" },
{ RuleCount: 2220, FwName: "fw-dist-b3" },
{ RuleCount: 1688, FwName: "fw-edge-c1" },
{ RuleCount: 1563, FwName: "fw-edge-c2" },
{ RuleCount: 1514, FwName: "fw-edge-c3" },
{ RuleCount: 1159, FwName: "fw-intsvc-d1" },
{ RuleCount: 1123, FwName: "fw-intsvc-d2" },
{ RuleCount: 902, FwName: "fw-branch-e1" },
{ RuleCount: 885, FwName: "fw-branch-e2" },
{ RuleCount: 877, FwName: "fw-branch-e3" },
{ RuleCount: 824, FwName: "fw-branch-e4" },
{ RuleCount: 822, FwName: "fw-branch-e5" },
{ RuleCount: 631, FwName: "fw-remote-f1" },
{ RuleCount: 474, FwName: "fw-remote-f2" },
{ RuleCount: 372, FwName: "fw-remote-f3" },
{ RuleCount: 331, FwName: "fw-lab-g1" },
{ RuleCount: 327, FwName: "fw-lab-g2" },
{ RuleCount: 326, FwName: "fw-lab-g3" }
];



comparison(i) = i.RuleCount; //field to sort
min_val = (minBy(RuleCount , comparison));
max_val = (maxBy(RuleCount , comparison));

N = float(length(RuleCount));
prime = .61803398875; //Golden ratio->fractional hash seed

foreach n in RuleCount
let x = n.RuleCount
let min_val = float(min_val.RuleCount)
let max_val = float(max_val.RuleCount)
let fx = float(x)
let binScaler = 10.0
let scaled = binScaler * (fx - min_val) * (N - 1.0) / (max_val - min_val)
let hash = fx * prime * 1.0
let hash_offset = hash - float(floor(hash))
let norm_rank = floor(1.0 + scaled + hash_offset )


select {

fwName:n.FwName,
originalValue: x,
scaled: scaled,
hash_input: hash,
hash_offset: hash_offset,
normalized: norm_rank
}

 

Query Logic 

 

This technique maps each record to a bin, simulating a rank.  This effectively normalizes the firewall rule count so the entire record set can be sorted and ranked.  

 

1. Min/Max Calculation

comparison(i) = i.RuleCount; // Field to sort

min_val = (minBy(RuleCount, comparison));

max_val = (maxBy(RuleCount, comparison));

We get the range of values to use for scaling — this anchors normalization.

 

2. Normalization and Bin Scaling

let scaled = binScaler * (fx - min_val) * (N - 1.0) / (max_val - min_val)

This transforms RuleCount into a floating point number representing its relative position in the dataset.

  • The binScaler helps spread values across more buckets to reduce the risk of collisions i.e. collided values have the same calculated bucket duplicating the rank index across the set.  
  • Without it, small datasets with tight ranges can produce too many ties.

 

3. Hash-Based Tie Breaking

let hash = fx * prime

let hash_offset = hash - floor(hash)

We apply a stable hash using the golden ratio. Its irrationality distributes values evenly between 0 and 1.

  • The hash_offset makes sure values that would otherwise land in the same bucket still end up distinct.
  • It’s deterministic — the same input will always yield the same offset.

 

4. Final Bin Assignment

let norm_rank = floor(1.0 + scaled + hash_offset)
  • Combines the normalized value and hash offset, then applies floor() to assign a final integer rank.

 

5. Output Fields

select {

  fwName,

  originalValue,

  scaled,

  hash_input,

  hash_offset,

  normalized

}
  • Original Value is the rulecount per firewall and normalized is the calculated rank, aka bucket.  Think of the normalized rank as an index we can use to characterize the record for rank scoring and sorting.  
  • All other fields are for debugging and process clarity.  

 

What You Can Do With It

  • Top-N filtering (example only):
where normalized >= 20

 

  • Percentile banding (example only):
when normalized >= 22 then "Top 10%"

when normalized >= 17 then "Top 25%"

 

  • Threshold-based alerting
  • Programmatic Input to security intelligence dashboards
  • Sorting approximations

Caveats 

  • Not a true sort — use it for banding, filtering, and percentile logic, not strict ordering -- at least not yet ;)
  • May produce ties — increase binScaler if values are close together.
  • Can be extended —adding array index offsets to guarantee tie break provides a unique normalized score (bucket) for each record.  This allows true sorting of records based upon any criteria.  

 

Summary

This is a high-performance and flexible technique to generate pseudo-rankings without sorting, ideal for environments like NQE where sorting and ordering operations are not exposed.  No iterations, no comparisons, no complex sorting — just simple math.

 

Use it when you need to:

  • Select top-N records
  • Map values to percentile bands
  • Enable threshold filtering, alerting, and insights
  • Sorting (observe caveats)

For those interested, the normalized rank can also be used to reorder a set of records based on any criteria. I’ll share a follow-up post soon to walk through how that works.


 

Be the first to reply!

Reply