Are you hosting stuff that's particularly prone to crawling (and by crawlers that don't respect robots.txt)? Of the spider traffic we see, the vast majority of it comes from Google and the other major search engines.
One example: there are several people who apparently scrape the front page of HN (and proggit, etc.) and then proceed to download all of those links repeatedly every minute (or second!) for several hours. Same link, over and over and over. I can only imagine what get rich quick scheme would require such behavior.
Woah, that suddenly explains why sometimes websites go down so quickly after they get linked on reddit. Surely most hosting won't be able to host 100's of requests, but some times I've seen it happen that websites linked from smaller subs went down quickly.
I do crawling from EC2, and yes, I would not like a 1Gbps traffic spike myself.
Do you deal with generic webpage crawlers that way, or targeted API abuse? Because the first ones can be smoothly shaved away with the help of Cloudflare, for instance.
Maybe they want to control indiscriminate acquisition of infrastructure across many different departments. You'd be surprised how many CFOs/CIOs don't know there is an invisible budget item somewhere in every small department which if added up would be a big item for the whole company.
Don't see them here or the subforum, yet
https://forums.aws.amazon.com/ann.jspa?annID=1701