    > Intelligent IP rotation

    > User behavior simulation

    > Automatic ban detections and solving

    > Retrying failed requests

    > Multi-threaded and easy to scale

Focus on business side and we will care about web scraping. Enjoy!


    Web scraping, web harvesting, or web data extraction - is a process of extracting data from websites. This term usually refers to automated processes implemented using specialized software.

    Here is a typical web scraping flow: let's say, we want to extract some specific information (title, price, description) on books listed in some online store (for example, amazon.com). The software will open specific web pages, grab the html of each page, convert to structured format and store to database or to csv/xls file. There are actually a lot of cases when web scraping can be used. Any automated data extraction from web sites is a web scraping: lead generation, product details extraction, price comparison, real estate monitoring, trends monitoring, tickets, social media and many many more.

    The internet contains tons of structured information and If you still don’t use web scraping in your business - you should think about, maybe you are losing some opportunity.

Proxy - is a remote server which can pass over the traffic from client to the target website and back. Here are the most popular use cases:

  • hide your real IP address
  • omit location blocks (some websites can allow access only from specific country)
  • speed up data delivering (for some cases sending traffic via proxy can be faster than sending traffic via your network)
  • omit captcha and other bans
  • additional traffic encryption and other cases

For web scraping you will usually use http/https, socks4/socks5 type of proxy.

Here are most popular groups of proxy for web scraping that you need to know:

  • dedicated (used by one user only)
  • shared (shared between a few users)
  • automatically rotated every request (new IP per each request)
  • automatically rotated once per interval (IP rotated usually once per 3/5/15 minutes)
  • smart proxy (high level proxy which does all dirty job like proxy rotation, error handling, omitting bans/captcha etc, retrying the request, user behavior simulation and mostly returning success response back, so you don't need to care about bans and proxy management on your side)

OneProxy - it's http/https proxy with host port pair such us gate.oneproxy.net:8001 and access token which you will receive after subscription, which needs to be used like proxy login:

$ curl --proxy gate.oneproxy.net:8001 --proxy-user MY_ACCESS_TOKEN: -L http://example.com

