A Burla Cloud demo went viral on Hacker News on April 30. The project scraped every public Airbnb listing from Inside Airbnb's open dump across 119 cities and four quarterly snapshots, then ran the photos through CLIP and Claude Haiku Vision to find drug dens, messy kitchens, pet cameos, and TVs mounted way too high.
The Infrastructure
| Resource | Scale |
|---|---|
| Photos processed | 1.7M unique URLs |
| Reviews analyzed | 50.7M |
| CPU workers | ~1,700 parallel |
| GPU | 20x A100 for embedding clusters |
| LLM | Claude Haiku (64 concurrent) |
| Total cost | Not disclosed (Burla demo) |
The pipeline ran on Burla, a parallel processing library that spins up thousands of workers in seconds. Each stage was checkpointed to /workspace/shared for resume recovery. The team pre-staged model weights to avoid HuggingFace rate limits and rechunked Parquet files to prevent memory overflow when processing 50.7M reviews.
The Pipeline
Each finding went through a two-stage funnel:
- CLIP scoring (CPU, ~1.7K workers) — scored every photo for visual categories like "messy room," "pet-shaped pixels," "TV mounted high"
- Haiku Vision confirmation (API, 64 concurrent) — rejected false positives (throw pillows that looked like dogs, clutter that was just small rooms)
For the funniest reviews, they used a three-tier cascade:
| Stage | Method | Input | Output | Time |
|---|---|---|---|---|
| 1 | Regex heuristics | 50.7M reviews | 200K candidates | 5 min |
| 2 | SBERT + KMeans (GPU) | 200K reviews | 12K diverse samples | 5 min |
| 3 | Claude Haiku | 12K reviews | 250 final picks | ~$0.65 |
This funnel saved ~99.98% of potential LLM costs while ensuring diversity.
What They Found
The team tested four hypotheses against 365-night calendar occupancy using bootstrap 95% confidence intervals:
| Hypothesis | Accepted? | Insight |
|---|---|---|
| Brighter photos = higher occupancy | Yes | Lighting matters for bookings |
| Messy listings = lower occupancy | Yes (inverted) | Actually messier places book MORE—likely a confound with lived-in/dorm-style listings |
| Pets in hero shots = higher occupancy | Yes | Pet cameos correlate with demand |
| Absurd/quirky photos = higher occupancy | Yes | Personality signals work |
The visual findings: bare bulb rooms with peeling walls that look like opium dens, chaotic kitchens that are genuinely messy (not just small), actually real pets (Haiku rejected throw pillows), and TVs mounted above fireplaces.
The Backlash
"What's the point of scanning 2M photos to find cats in pictures?"
The Hacker News thread is split. Half the comments call it a waste of compute. Others point out it's essentially a Burla Cloud advertisement—the entire project exists to demonstrate Burla's parallel processing library. Someone noted that Inside Airbnb's guidelines explicitly say "Do not scrape data from the site" and "Only take the data you need."
"There's a giant Airbnb x Burla logo at the top. People are saying there's a lawsuit pending, it's against guidelines, what's the point..."
Verdict
Is this useful? Probably not for Airbnb hosts. But as a demo of what's possible when you parallelize CLIP + Haiku Vision across 1.7K workers and 20 A100s, it's genuinely impressive. The map of every flagged listing worldwide is worth a look. The correlation findings—that absurd photos and messy rooms actually predict higher occupancy—are genuinely counterintuitive.
The project is open source under Burla-Cloud/examples on GitHub. Each pipeline stage is independently runnable and checkpointed. Created by jmp1062.
Sources
https://news.ycombinator.com/item?id=47963026 https://news.ycombinator.com/item?id=47926535 https://news.ycombinator.com/item?id=47926131 https://github.com/Burla-Cloud/examples/blob/main/airbnb-burla-demo/WRITEUP.md https://x.com/betterhn20/status/2049910657559019718