What You See vs. What the Browser Sees

When you look at a web page, you see text, images, buttons, and forms arranged in a visual layout. But the browser sees something very different: a structured document made of nested elements, each described by code.

Understanding this difference is the key to understanding how automation works. webSlinger doesn't interact with the visual appearance of a page -- it interacts with the underlying structure.

What you see:

A login form with two fields and a button:

Email
user@example.com
Password
********
Log In
What the browser sees:
<form id="loginForm"> <label>Email</label> <input type="email" name="email"> <label>Password</label> <input type="password" name="pass"> <button type="submit"> Log In </button> </form>

Every visible element on the page -- every button, link, heading, image, and input field -- has a corresponding piece of code that describes what it is and how it behaves. This code is called HTML (HyperText Markup Language).

The DOM: A Page as a Tree

The browser reads HTML and builds an internal model of the page called the DOM (Document Object Model). Think of it as a family tree where every element is nested inside a parent element.

html | +-- head | +-- title -- "My Bank" | +-- body +-- nav | +-- a -- "Home" | +-- a -- "Accounts" | +-- form +-- label -- "Email" +-- input (type="email") +-- label -- "Password" +-- input (type="password") +-- button -- "Log In"

Key terminology:

  • Element -- A single node in the tree (a tag with its content). The <form> is an element.
  • Parent -- The element that contains another. body is the parent of form.
  • Children -- The elements inside a parent. The input fields are children of form.
  • Siblings -- Elements that share the same parent. The two input fields are siblings.
Why this matters for automation: When webSlinger needs to click a button or type into a field, it navigates this tree to find the exact element. The tree structure is what makes each element uniquely identifiable, even when two elements look identical on screen.

Tags and Attributes

Tags: What an Element Is

Every HTML element has a tag name that describes its role on the page. Common tags you'll encounter:

Tag Purpose Example
<input> Text fields, checkboxes, radio buttons Login forms, search bars
<button> Clickable buttons "Submit", "Next", "Add to Cart"
<a> Links (anchors) Navigation links, "Read more"
<select> Dropdown menus Country selector, date picker
<div> Generic container (groups other elements) Cards, sections, panels
<table> Tabular data Transaction lists, pricing tables

Attributes: Properties of an Element

Attributes provide additional information about an element. They appear inside the opening tag as name="value" pairs.

<input type="email" id="userEmail" class="form-field" placeholder="Enter your email">

Important attributes for automation:

  • id -- A unique identifier for the element. No two elements on the same page should have the same id. This is the most reliable way to find an element.
  • class -- A category label. Multiple elements can share the same class, and one element can have multiple classes. Used for styling and grouping.
  • type -- Specifies the kind of input (text, email, password, checkbox, etc.).
  • name -- Identifies the field when the form is submitted. Often used by the server to know which value came from which field.
  • href -- The URL a link points to.
Think of it this way: The tag tells you what something is (a button, an input field). The attributes tell you which specific one it is and how it behaves.

What You'll See in webSlinger

When you use webSlinger to extract data from a page, the extraction menu shows you these same concepts. Each extractable element is labeled with its tag name in angle brackets -- <span>, <a>, <div> -- so you know what kind of element you're working with.

For each element, webSlinger shows you the data you can extract:

  • Direct text -- The text that belongs directly to that element, excluding text inside any child elements. For example, if a <div> contains a price and a nested <span> with a currency symbol, the direct text is just the price.
  • Text -- All the text inside the element, including text from every child element combined. This only appears when it differs from the direct text.
  • Attributes -- The raw HTML attribute values from the element. You'll see the actual attribute names like href, src, or data-product-id, along with their current values. These are useful when the data you need is stored in the code rather than displayed as visible text.

When a value contains numbers, webSlinger also offers a numeric conversion option. This applies to direct text, text, and attributes alike -- so if an attribute like data-price contains "$29.99", you can extract just the number 29.99.

Each extractable value is shown with a sample from the page, so you can see exactly what you'll get before you select it.

Selectors: Finding Elements

A selector is a pattern that describes how to find a specific element in the DOM tree. Think of it as an address -- just as a street address uniquely identifies a building in a city, a selector uniquely identifies an element on a page.

A selector can use any combination of an element's properties to pinpoint it: its tag name, its attributes, its text content, its position among siblings, or its location within the tree.

Why resilience matters: Websites change. Developers rename classes, add new elements, and restructure layouts. Any individual selector can break when the properties it depends on change. webSlinger addresses this by generating multiple distinct selectors for each element, each built from a different combination of features. During automation, these selectors vote on the correct element through consensus -- so even if some selectors break due to page changes, the others still find the right target.

How webSlinger Builds Selectors

When you interact with an element during recording, webSlinger doesn't just grab the first selector it can find. It:

  1. Extracts dozens of features from the element -- tag name, id, classes, text content, position among siblings, parent structure, and more
  2. Builds a decision tree that determines which combination of features uniquely identifies this element among all others on the page
  3. Validates in real time by immediately testing the generated selector in a separate tab to confirm it finds the correct element
The result: Selectors that survive routine website updates because they're built from multiple distinguishing features rather than a single fragile property.

The AI Gap

AI Knows What, Not How

AI models are trained on vast amounts of human-written text. They can understand action-level instructions like "click the menu button," "select March in the month dropdown," or "extract the price for item 1." They know what to do at each step.

But every one of those actions requires finding a specific element in the DOM. To click a button, you need a selector that locates that exact button. To select an option from a dropdown, you need a selector for that specific <select> element. AI can understand the action, but it can't produce the selector needed to carry it out.

This is the gap: AI speaks our language, but the language of websites is selectors. It doesn't know how to translate from one to the other.

The Session Map: A Rosetta Stone

This is where webSlinger comes in. Before you start recording, you describe your objectives -- the goals of the session. These are the "what": download last month's bank statement, extract product prices, fill out a form with this data.

Then you record the session. Each action you perform -- every click, input, and extraction -- becomes an action in the session map, paired with a robust set of validated selectors. These are the "how": the precise DOM instructions needed to carry out each step.

The session map matches goals to actions, connecting what you wanted to accomplish with the exact sequence of steps and selectors needed to accomplish it:

  • Goals -- What needs to happen, described in human terms
  • Actions -- How to make it happen, each with multiple redundant selectors confirmed in real time

The result is a session map -- a Rosetta Stone that translates between human intent and the precise language of the DOM.

With a session map, AI can cross the gap. It can read your goals, follow the recorded sequence of actions, and execute each one using validated selectors -- turning human knowledge into reliable automation.