Many customers often ask me about digging into data, especially when it involves sifting through device status or custom command outputs. Sometimes, it's a piece of cake to handle because the data is nicely structured, but other times, it's like solving a puzzle. Just last week, I was helping out a customer who needed to extract security policy data from their enforcement points to keep tabs on various compliance matters. The data they had to deal with was all over the place – lines of varying lengths, multiple strings scattered within the lines, and positional changes for the data we needed.
This got me thinking, and I wanted to share some techniques for handling these tricky data chunks. I can't share the exact customer query with you, but I've thrown together a sample query to show one way of tackling this.
Imagine you've got data like this:
"Don't teach me how to extract stuff with IP address 10.1.1.1 and float 10 from string."
"Please teach me how not to extract stuff with IP address 10.1.1.5 and float 21.1 from string."
"Please teach me how every method to extract stuff with IP address 123.1.1.5 and float 45.777 from string."
The mission here is to grab the first word, all the text between "me" and "stuff," snag the second word from that text, and the IP address and number as a float type. You'll notice that the amount of text to grab varies in each line, which messes with the position of the IP address and number. It's clear that this data doesn't play nice in columns, so we need a few tricks up our sleeves to get the job done.
Below, I've got a query that offers one way to crack this puzzle. I'm also eager to hear from the community about any other challenges and solutions you've stumbled upon in the world of data extraction. Load this up and tinker around-its the only way to learn. Don’t hesitate to comment, criticize, or ask for more clarity.
/*Lesson is to extract the first word, the words between (me and stuff) the ip address and the float number with correct type
In addition, the number of words extracted are unknown. Extract the words, parse, assign var to 2nd word only.
*/
//functions
//splits text on space
splitOnSpace(s) = patternMatch(s, `{string*}`);
//simple removal of unwanted items
remove(s, substring) = replace(s, substring, "");
//test that some character exists
contains(s, substring) = matches(s, "*" + substring + "*");
//simple replace
replace1(s, target1, replacement) = (replace(s, target1, replacement));
//replaces target1 and 2 with replacment
replace2(s, target1, target2, replacement) = replace(replace(s, target1, replacement), target2, replacement);
//Vars
pattern = ```
{word:(string)} {"teach me"} {text:((!"stuff" string)*)} {(!ipv4Address string)*} {ipAddress:ipv4Address} {(!float string)*} {numb:float}
```;
parseThis =```
Dont teach me how to extract stuff with ip address 10.1.1.1 and float 10 from string
Please teach me how not to extract stuff with ip address 10.1.1.5 and float 21.1 from string
Please teach me the best method to extract stuff with ip address 123.1.1.5 and float 45.777 from string
```;
//main
foreach i in r0]
let data = parseConfigBlocks(OS.UNKNOWN, toString(parseThis)) //makes a list of config lines from String->parseThis
let parsedData = blockMatches(data , pattern) //apply pattern and match criteria-creates a list of vars and metadata
foreach item in parsedData
//unknown word extraction routine
let wordList = item.data.text //get the unknown number of words
let wordText = replace1(toString(item.data.text), "," , " ") //Convert to space delimited string
let wordText = replace2(wordText, "w", "]", "")//clean brackets from text
let word2 = patternMatch(wordText, `{string} {string}`)
select{
word1:item.data.word,
ipAddress:item.data.ipAddress,
float:item.data.numb,
wordText:wordText,
word2:word2,
rawdata: data,
parsed: parsedData,
data: item.data,
block: item.blocks, //show each line iteration
diff: item.diffCount
}