Okay, so today I’m gonna walk you through my little adventure with scraping some Barca transfer news. It was a bit of a rollercoaster, but hey, we got there in the end!

First off, I needed to figure out where to even get the info. I spent a good hour just bouncing between sports sites, trying to find one that had decent news and wasn’t a total pain to navigate. Finally settled on one – let’s just call it “SoccerBuzz” – looked clean enough.
Next up, the actual scraping. I fired up Python, because that’s my go-to. I started with `requests` to grab the HTML. Easy peasy, right? Not so fast. SoccerBuzz had some sneaky anti-scraping stuff going on. I was getting blocked almost instantly. Bummer.
Alright, time for Plan B: Headers! I started adding some headers to my `requests` call, mimicking a real browser. User-Agent, Accept-Language, the whole shebang. Still getting blocked. Frustrating!
Then I remembered reading something about rotating proxies. Seemed like a pain, but worth a shot. I found a free proxy list online (yeah, yeah, I know, risky business). I rigged up my script to randomly pick a proxy from the list for each request. Guess what? It worked! For like, five minutes. Then those proxies got banned too. This was becoming a full-time job.
Okay, back to the drawing board. I started thinking about how humans browse the web. We don’t just hammer a website with requests every millisecond. We take breaks! So I added a `*()` call between each request, varying the delay randomly. It seemed to help a bit, but I was still getting blocked occasionally.
Then I figured, why not combine the proxy rotation with the delays? So I had my script pick a random proxy AND wait a random amount of time before each request. Bingo! That seemed to do the trick. I was finally able to scrape the news without getting blocked every other second.
With the HTML in hand, it was time to parse it. I used `BeautifulSoup` for this. It took a bit of poking around to find the right HTML tags that contained the actual transfer news, but eventually I got there. I was looking for the article titles and the short descriptions.
The HTML was kind of messy, with lots of extra tags and whitespace. So I used some string manipulation to clean it up. Stripped the leading/trailing whitespace, removed any weird characters, and generally made it look presentable.

Finally, I saved the extracted data to a CSV file. Just the date, title, and description for each news item. Nothing fancy, but it got the job done.
Lessons learned:
- Web scraping can be a real cat-and-mouse game. Sites don’t always want you scraping them, so you have to be sneaky.
- Rotating proxies and adding delays are your friends.
- `BeautifulSoup` is a lifesaver for parsing messy HTML.
It was a bit of a grind, but hey, now I’ve got my own little Barca transfer news aggregator. Maybe I’ll even build a little website around it. Who knows?