What You See vs. What the Browser Sees
When you look at a web page, you see text, images, buttons, and forms arranged in a visual layout. But the browser sees something very different: a structured document made of nested elements, each described by code.
Understanding this difference is the key to understanding how automation works. webSlinger doesn't interact with the visual appearance of a page -- it interacts with the underlying structure.
A login form with two fields and a button:
user@example.com
********
Every visible element on the page -- every button, link, heading, image, and input field -- has a corresponding piece of code that describes what it is and how it behaves. This code is called HTML (HyperText Markup Language).
The DOM: A Page as a Tree
The browser reads HTML and builds an internal model of the page called the DOM (Document Object Model). Think of it as a family tree where every element is nested inside a parent element.
Key terminology:
- Element -- A single node in the tree (a tag with its content). The
<form>is an element. - Parent -- The element that contains another.
bodyis the parent ofform. - Children -- The elements inside a parent. The
inputfields are children ofform. - Siblings -- Elements that share the same parent. The two
inputfields are siblings.
Selectors: Finding Elements
A selector is a pattern that describes how to find a specific element in the DOM tree. Think of it as an address -- just as a street address uniquely identifies a building in a city, a selector uniquely identifies an element on a page.
A selector can use any combination of an element's properties to pinpoint it: its tag name, its attributes, its text content, its position among siblings, or its location within the tree.
How webSlinger Builds Selectors
When you interact with an element during recording, webSlinger doesn't just grab the first selector it can find. It:
- Extracts dozens of features from the element -- tag name, id, classes, text content, position among siblings, parent structure, and more
- Builds a decision tree that determines which combination of features uniquely identifies this element among all others on the page
- Validates in real time by immediately testing the generated selector in a separate tab to confirm it finds the correct element
The AI Gap
AI Knows What, Not How
AI models are trained on vast amounts of human-written text. They can understand action-level instructions like "click the menu button," "select March in the month dropdown," or "extract the price for item 1." They know what to do at each step.
But every one of those actions requires finding a specific element in the DOM. To click a button, you need a selector that locates that exact button. To select an option from a dropdown, you need a selector for that specific <select> element. AI can understand the action, but it can't produce the selector needed to carry it out.
The Session Map: A Rosetta Stone
This is where webSlinger comes in. Before you start recording, you describe your objectives -- the goals of the session. These are the "what": download last month's bank statement, extract product prices, fill out a form with this data.
Then you record the session. Each action you perform -- every click, input, and extraction -- becomes an action in the session map, paired with a robust set of validated selectors. These are the "how": the precise DOM instructions needed to carry out each step.
The session map matches goals to actions, connecting what you wanted to accomplish with the exact sequence of steps and selectors needed to accomplish it:
- Goals -- What needs to happen, described in human terms
- Actions -- How to make it happen, each with multiple redundant selectors confirmed in real time
The result is a session map -- a Rosetta Stone that translates between human intent and the precise language of the DOM.