Web Scraping Supplier Stock Levels with Python and BeautifulSoup

The Manual Stock Check Problem

When you sell products from multiple suppliers, keeping inventory accurate is a constant battle. Each supplier has their own website or portal, and stock levels change daily. Manually checking 5+ supplier sites every morning wastes hours.

We built a dashboard that scrapes supplier stock levels automatically and syncs them to Shopify.

The Stack

Python with Flask for the web dashboard
BeautifulSoup for HTML parsing
APScheduler for periodic scraping
Shopify API for inventory updates
Docker for consistent deployment
Railway for hosting

Building the Scraper

Handling Different Supplier Sites

Each supplier's website is different. We use a plugin architecture where each supplier has its own scraper module:

# scrapers/base.py
from abc import ABC, abstractmethod

class BaseScraper(ABC):
    def __init__(self, config):
        self.config = config
        self.session = requests.Session()
    
    @abstractmethod
    def login(self):
        pass
    
    @abstractmethod
    def get_stock_levels(self):
        """Returns dict of {sku: quantity}"""
        pass

# scrapers/supplier_a.py
class SupplierAScraper(BaseScraper):
    def login(self):
        self.session.post(self.config['login_url'], data={
            'username': self.config['username'],
            'password': self.config['password']
        })
    
    def get_stock_levels(self):
        self.login()
        response = self.session.get(self.config['catalog_url'])
        soup = BeautifulSoup(response.text, 'html.parser')
        
        stock = {}
        for row in soup.select('table.products tr[data-sku]'):
            sku = row['data-sku']
            qty_text = row.select_one('.stock-qty').text.strip()
            stock[sku] = parse_quantity(qty_text)
        
        return stock

Parsing Messy HTML

Supplier websites are rarely clean. Common challenges:

def parse_quantity(text):
    """Handle various stock level formats."""
    text = text.strip().lower()
    
    if text in ('out of stock', 'oos', '-', ''):
        return 0
    if text in ('in stock', 'available'):
        return 99  # Unknown but available
    if '+' in text:
        return int(text.replace('+', ''))  # "50+" -> 50
    
    # Extract number from strings like "Qty: 23" or "23 units"
    match = re.search(r'(\d+)', text)
    return int(match.group(1)) if match else 0

Scheduled Scraping with APScheduler

APScheduler runs scraping jobs on a configurable schedule without requiring an external cron service:

from apscheduler.schedulers.background import BackgroundScheduler
from apscheduler.triggers.interval import IntervalTrigger

scheduler = BackgroundScheduler()

def run_all_scrapers():
    for supplier in get_active_suppliers():
        try:
            scraper = get_scraper(supplier)
            stock_levels = scraper.get_stock_levels()
            update_database(supplier.id, stock_levels)
            sync_to_shopify(stock_levels)
        except Exception as e:
            log_error(supplier.name, str(e))
            send_alert(f"Scraper failed for {supplier.name}: {e}")

scheduler.add_job(
    run_all_scrapers,
    trigger=IntervalTrigger(hours=4),
    id='stock_sync',
    replace_existing=True
)
scheduler.start()

Shopify Inventory Sync

After scraping, stock levels are pushed to Shopify:

import shopify

def sync_to_shopify(stock_levels):
    for sku, quantity in stock_levels.items():
        variant = find_shopify_variant_by_sku(sku)
        if variant is None:
            continue
        
        inventory_item_id = variant.inventory_item_id
        location_id = get_primary_location()
        
        shopify.InventoryLevel.set(
            inventory_item_id=inventory_item_id,
            location_id=location_id,
            available=quantity
        )

The Dashboard

The Flask dashboard shows:

Last sync status for each supplier (success/failure/timestamp)
Stock comparison — your Shopify stock vs. supplier stock
Low stock alerts — products below reorder threshold
Sync history — track stock level changes over time

@app.route('/dashboard')
def dashboard():
    suppliers = Supplier.query.all()
    low_stock = Product.query.filter(
        Product.stock_level < Product.reorder_threshold
    ).all()
    recent_syncs = SyncLog.query.order_by(
        SyncLog.created_at.desc()
    ).limit(50).all()
    
    return render_template('dashboard.html',
        suppliers=suppliers,
        low_stock=low_stock,
        recent_syncs=recent_syncs
    )

Docker Deployment

Consistent deployment is critical when you're running scrapers that depend on specific Python packages:

FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
EXPOSE 5000
CMD ["gunicorn", "-w", "2", "-b", "0.0.0.0:5000", "app:app"]

Ethical Considerations

Respect robots.txt and terms of service
Rate limit your requests — don't hammer supplier servers
Cache results — only scrape on schedule, not on every page load
Identify your bot — set a descriptive User-Agent header

Results

The stock sync dashboard eliminated 2+ hours of daily manual stock checking and reduced overselling incidents to near zero. Suppliers appreciate that we're not flooding their sites with requests, and inventory accuracy across all channels improved dramatically.