Build Your First Workflow with Web Looper: A Step-by-Step Tutorial
Building an automated workflow with Web Looper lets you save time by automating repetitive web tasks such as scraping data, filling forms, or monitoring pages. This step-by-step tutorial assumes you have Web Looper installed and basic familiarity with web pages (links, selectors). If you need setup steps, I’ll assume the defaults: Web Looper is installed and running locally, and you have a project directory ready.
1. Define the workflow goal
Decide a clear, concrete outcome. Example: extract product names and prices from a category page and save them to CSV.
2. Open a new workflow
- Create a new workflow file (JSON/YAML) or open the Web Looper GUI and click “New Workflow.”
- Name it “products-to-csv”.
3. Configure the start URL
- Start URL: set to the category page you want to scrape, e.g., https://example.com/category/widgets
4. Add navigation steps
- Load page: set a step to load the start URL and wait for network idle or a specific element (e.g., product list).
- Pagination (optional): if multiple pages, add a loop:
- Locate the “next page” button selector.
- Add a conditional step: while “next” exists, click it, wait for load, and continue extracting.
5. Identify selectors for data
- Inspect the page and find selectors for fields:
- Product name: .product-card .title
- Price: .product-card .price
- Product link (optional): .product-card a::href
- Use CSS selectors or XPath depending on page structure.
6. Extract data
- Add an “Extract” action in the workflow targeting the product list container.
- For each product item, map fields:
- name ->
.product-card .title - price ->
.product-card .price - link ->
.product-card a(attribute href)
- name ->
- Ensure you set the extraction to return an array of items per page.
7. Clean and transform (optional)
- Add transformation steps:
- Strip currency symbols from price (e.g., remove “$”).
- Trim whitespace from names.
- Convert price to a number type for correct sorting/aggregation.
Example pseudocode transformation:
javascript
item.price = parseFloat(item.price.replace(/[^0-9.]/g, “)); item.name = item.name.trim();
8. Store results
- Add an output action to append extracted items to a CSV file:
- Filename: products.csv
- Headers: name, price, link
- Alternatively, save to JSON or push to a database/API.
9. Error handling and retries
- Add retry logic for network steps (e.g., retry 2 times on failure).
- Add a fallback when selectors aren’t found: log the page URL and continue.
10. Test the workflow
- Run the workflow on a single page first.
- Inspect the output CSV for correctness: fields present, prices cleaned.
- If items are missing, refine selectors and re-run.
11. Schedule or run at scale
- For regular scraping, schedule the workflow (e.g., daily).
- When scaling, respect site terms and rate limits: add delays between page requests (e.g., 1–3 seconds) and set concurrency to a low value.
12. Example minimal workflow (conceptual)
yaml
name: products-to-csv start_url: https://example.com/category/widgets steps: - load: {waitFor: ’.product-list’} - extract: container: ’.product-card’ fields: name: ’.title’ price: ’.price’ link: {selector: ‘a’, attr: ‘href’} - transform: - code: | item.price = parseFloat(item.price.replace(/[^0-9.]/g,”)); item.name = item.name.trim(); - save: {format: csv, path: products.csv} - paginate: nextSelector: ’.pagination .next’ loop: true
Best practices
- Respect robots.txt and site terms of service.
- Use realistic delays and identify yourself with a polite User-Agent if required.
- Limit scraping frequency to avoid overloading sites.
- Test selectors with multiple pages and device viewports if the site has responsive layouts.
Follow these steps and you’ll have a reliable first Web Looper workflow that extracts product data into a CSV. If you want, I can generate a ready-to-run workflow file for a specific target URL — tell me the URL and desired fields.
Leave a Reply