• We’re currently investigating an issue related to the forum theme and styling that is impacting page layout and visual formatting. The problem has been identified, and we are actively working on a resolution. There is no impact to user data or functionality, this is strictly a front-end display issue. We’ll post an update once the fix has been deployed. Thanks for your patience while we get this sorted.

Forum is being scraped again

Page 36 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.
There's a MASSIVE gulf of difference between 5K-10K real users just browsing the site and consuming content vs. 350,000+ bots scraping your site, following every possible link, downloading every possible image, cataloguing whatever it possibly can.
Archive anything over a year old and block all outside access to it. Maybe only make it available to members meeting certain criteria.


They can't scrape what they can't find.
 
Archive anything over a year old and block all outside access to it. Maybe only make it available to members meeting certain criteria.


They can't scrape what they can't find.
Lots of avenues to explore for sure, but important to remember that all sites these days are at the whim of the almighty Google. Praise Google. Feed Google. All hail Google. 🤣
 
  • Haha
Reactions: 511
Wow guess these bots are really hammering the server hard then. The amount they hammer must scale with size of site.

Mine gets hammered randomly but highest it got was 1964 at once which is peanuts.
 
If it's AI scrapers doing the damage, they could potentially follow us to other forums. Which would be troubling. Though they might not if the new forum lacked the depth of content that's present here.
 
If they're simply hitting the forum too fast wonder if we could just set rate limiting. If an IP is hitting the server more than X per second it gets temporarily blocked. Not sure how easy it would be to set that up, guess it could be done with a script that looks at the apache log.
 
F*** Google
Necessary evil. Can you imagine how bad internet search would be without feeding that unholy beast? They exist because they fulfill a purpose that no one, even Microsoft has been able to challenge with their billions. When's the last time you actually got a useful result from Bing? No matter how bad they are, their stuff works and helps hundreds of millions, if not billions, of people daily. I don't get a single cent from Google (got no sense to understand how adsense works) but if they suddenly vanish into thin air, the ensuing chaos could take years to disperse and the company or companies filling the void may never approach the quality they provide.

And don't even dare to imagine a world where Apple is the only quality mobile platform without the existence of Android. You F Google, you F everybody.
 
Mhhh maybe we can
Necessary evil. Can you imagine how bad internet search would be without feeding that unholy beast? They exist because they fulfill a purpose that no one, even Microsoft has been able to challenge with their billions. When's the last time you actually got a useful result from Bing? No matter how bad they are, their stuff works and helps hundreds of millions, if not billions, of people daily. I don't get a single cent from Google (got no sense to understand how adsense works) but if they suddenly vanish into thin air, the ensuing chaos could take years to disperse and the company or companies filling the void may never approach the quality they provide.

And don't even dare to imagine a world where Apple is the only quality mobile platform without the existence of Android. You F Google, you F everybody.
Yeah that is true for biggest corporations
 
Google decided to block my server's IP at some point so gmail won't work. My site does send out lot of email but it's not spam, it's just registration, password resets etc... lot of it is bot generated, they sign up and complete the first registration step then get stuck at the captcha that's required for the final stage. This does generate lot of email traffic though so I may need to modify it so captcha is on main registration page. I may also look at offloading SMTP to a 3rd party. SMTP is pain.

Code:
to=<[user]@gmail.com>, relay=gmail-smtp-in.l.google.com[142.250.31.26]:25, delay=0.85, delays=0.14/0.01/0.24/0.46, dsn=5.7.1, status=bounced (host gmail-smtp-in.l.google.com[142.250.31.26] said: 550-5.7.1 [144.217.157.4      18] Gmail has detected that this message is likely 550-5.7.1 suspicious due to the very low reputation of the sending IP address. 550-5.7.1 To best protect our users from spam, the message has been blocked. 550-5.7.1 For more information, go to 550 5.7.1  https://support.google.com/mail/answer/188131 af79cd13be357-7e2f30b81f7si23159785a.170 - gsmtp (in reply to end of DATA command))
 
You may have mistyped your email since the messages bounced back. You should be able to log in now.
Not going to bother and thanks for raising the possibility that I have such fat pudgy fingers that I can't even type my own email properly. It bounced back for the same reason it did for Reds' forum. Non-compliance with Google's updated rules of engagement.
 
The message literally says:



Remote host said: 550-5.1.1 The email account that you tried to reach does not exist. Please try double-checking the recipient's email address for typos or unnecessary spaces.
 
Remote host said: 550-5.1.1 The email account that you tried to reach does not exist. Please try double-checking the recipient's email address for typos or unnecessary spaces.
If you are able to get that much detail from the Admin or if you are yourself the admin, PM me the email please so I can admit that YES, it was my mistake and then I will try again. Thanks.
 
@igor_kavinski @GodisanAtheist I send a test email from my personal email just to test the gmail fiasco. My personal email server is on a different IP than the server being used by the forum so curious if it goes through. According to logs it did. Also your account is active, I did it manually for now. Still need to sort out the forum sending mail to gmail.

I may look at offloading that to a 3rd party if worse comes to worse. Google doesn't really like small guys hosting their own mail it seems. Good way to convince people to use their cloud services lol.
 
Back
Top