Artem Golubin

Shady economics of proxy services

Wed, 04 May 2022 16:28:08 -0000

Residential proxies are the most demanded type of proxies on the proxy market. Their price increases each year.

In this article, I want to write down my understanding of the economics of proxy services. In particular, I describe types of proxy offerings, their typical clients, and why the majority of the market is supplied by malware.

Proxy types

There are two distinctive types of proxies on the market:

Data center (server) proxies
Residential proxies (broadband and mobile)

Data center proxies

As the name implies — such proxies reside in data centers. To create a pool of proxies, proxy providers lease or buy IPV4 subnets. Usually, they are relatively cheap when compared to residential proxies.

[....]

How masscan works

Sat, 30 Apr 2022 15:06:22 -0000

Masscan is a fast port scanner capable of scanning the entire IPv4 internet in under five minutes. To achieve maximum speed, it requires a stable 10 Gigabit link and a custom network driver for Linux. In comparison, it can take weeks or even months for the naive implementation of port scanners. This article describes key features behind the internal design of masscan.

What is port scanning?

Port scanning is a method to determine which ports on a specified list of IPs are open and accept connections. People use it to find web servers, proxy servers, databases, and other internet services for security research. Usually, port scanners target the TCP/IP protocol. Although, UDP port scanning is also possible. This article will only focus on the TCP/IP (IPv4) running under the Linux platform since only the Linux version can handle more than two million packets per second.

[....]

On code isolation in Python

Wed, 21 Oct 2020 14:55:36 -0000

I started learning Python in 2009, and I had a pretty challenging task and somewhat unusual use of Python. I was working on a desktop application that used PyQT for GUI and Python as the main language.

To hide the code, I embedded Python interpreter into a standalone Windows executable. There are a lot of solutions to do so (e.g. pyinstaller, pyexe), and they all work similarly. They compile your Python scripts to bytecode files and bundle them with an interpreter into an executable. Compiling scripts down to bytecode makes it harder for people with bad intentions to get the source code and crack or hack your software. Bytecode has to be extracted from the executable and decompiled. It can also produce obfuscated code that is much harder to understand.

[....]

Clipboard API for browsers is inconsistent

Tue, 08 Sep 2020 12:55:32 -0000

I use clipboard a lot when I work with images or screenshots. It allows me to upload or transfer images that are not on my disk. For example, I can copy an image from a graphics editor to browsers, messengers, and other applications. This approach keeps my Downloads folder clean and sane.

I always thought that when you paste an image to a browser, it's content stays the same. It's not. As it turned out, when you copy a JPG image to Chrome, it converts it to PNG.

This is super bad for the web because your 400KB JPG can become a 2.5MB PNG file. Many content creators paste their images to WYSIWYG editors,[....]

How to turn an ordinary gzip archive into a database

Mon, 24 Aug 2020 00:11:05 -0000

This article demonstrates how specially crafted but ordinary gzip archives can be used as a database like storage. It also introduces a Python package and explains how it works.

gzip is a popular file compression format to store large amounts of raw data. It has a good data compression ratio, but relatively slow compression/decompression speed.

Many companies use it in Big data applications when they need to store compressed CSV or JSON lines files. Such file formats are row-oriented and usually processed line by line. gzip can save a lot[....]

How to track and display profile views on GitHub

Mon, 06 Jul 2020 20:24:38 -0000

As part of recent design changes, GitHub has introduced READMEs for profiles. By creating a repo with your name and adding README.md file with markdown to it, you can now add a rich description of yourself.

Here is an example of how it looks like:

This feature may not be available to your profile yet. GitHub uses selectively rolling.

[....]

Public SSH keys can leak your private infrastructure

Thu, 28 Nov 2019 11:16:56 -0000

This article describes a minor security flaw in the SSH authentication protocol that can lead to unexpected private infrastructure disclosure. It also provides a PoC written in Python.

Asymmetric cryptography, or public-key cryptography, is the most common way to identify and authorize a user on an SSH server. It is also used to encrypt and manage access to different protocols or tools, such as Git, SFTP, SCP, and rsync.

Asymmetric cryptography uses a pair of keys: a public key and a private key. A public key can be restored from a private key, but not vice versa. It's a well-known fact that public keys meant to be public and can be widely shared. Unlike public keys, private keys must be not shared and kept in secret by their owners.

[....]

Detecting SQL injections in Python code using AST

Sun, 28 Apr 2019 12:13:58 -0000

Python has a built-in ast module that lets you inspect, parse and edit Python code. AST stands for abstract syntax tree, a data structure that makes it easy to analyze, inspect and edit programming language code.

When working with abstract trees, you don't have to worry about the syntax of a programming language. Abstract trees represent relations between objects, operators and language expressions.

This article shows a real-world example of how you can use this module to detect SQL injection vulnerabilities in Python code.

[....]

How Python saves memory when storing strings

Thu, 09 Aug 2018 22:33:30 -0000

Since Python 3, the str type uses Unicode representation. Unicode strings can take up to 4 bytes per character depending on the encoding, which sometimes can be expensive from a memory perspective.

To reduce memory consumption and improve performance, Python uses three kinds of internal representations for Unicode strings:

1 byte per char (Latin-1 encoding)
2 bytes per char (UCS-2 encoding)
4 bytes per char (UCS-4 encoding)

When programming in Python all strings behave the same, and most of the time we don't notice any difference. However, the difference can be very remarkable and sometimes unexpected when working with large amounts of text.

[....]

How virtual environment libraries work in Python

Fri, 29 Jun 2018 11:07:40 -0000

Have you ever wondered what happens when you activate a virtual environment and how it works internally? Here is a quick overview of internals behind popular virtual environments, e.g., virtualenv, virtualenvwrapper, conda, pipenv.

Initially, Python didn't have built-in support for virtual environments, and such feature was implemented as a hack. As it turns out, this hack is based on a simple concept.

When Python starts its interpreter, it searches for the site-specific directory where all packages are stored. The search starts at the parent directory of a Python executable location and continues by backtracking the path (i.e., looking at the parent directories) until it reaches the root directory. To determine if it's a site-specific directory, Python looks for the os.py module, which is a mandatory requirement by Python in order to work.

[....]