<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Artem Golubin</title><link>https://rushter.com/blog/feed/</link><description>Python, Machine learning, NLP, websec, etc.</description><language>en</language><lastBuildDate>Wed, 04 May 2022 16:28:08 -0000</lastBuildDate><item><title>Shady economics of proxy services</title><link>https://rushter.com/blog/proxy-services/</link><description>&lt;p&gt;Residential proxies are the most demanded type of proxies on the proxy market. Their price increases each year.&lt;/p&gt;
&lt;p&gt;In this article, I want to write down my understanding of the economics of proxy services. In particular, I describe types of proxy offerings, their typical clients, and why the majority of the market is supplied by malware.&lt;/p&gt;
&lt;h3&gt;Proxy types&lt;/h3&gt;
&lt;p&gt;There are two distinctive types of proxies on the market:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Data center (server) proxies&lt;/li&gt;
&lt;li&gt;Residential proxies (broadband and mobile)&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;Data center proxies&lt;/h3&gt;
&lt;p&gt;As the name implies — such proxies reside in data centers. To create a pool of proxies, proxy providers lease or buy IPV4 subnets. Usually, they are relatively cheap when compared to residential proxies. &lt;/p&gt;[....]</description><pubDate>Wed, 04 May 2022 16:28:08 -0000</pubDate><guid>https://rushter.com/blog/proxy-services/</guid></item><item><title>How masscan works</title><link>https://rushter.com/blog/how-masscan-works/</link><description>&lt;p&gt;&lt;a href="https://github.com/robertdavidgraham/masscan"&gt;Masscan&lt;/a&gt; is a fast port scanner capable of scanning the entire IPv4 internet in under five minutes. To achieve maximum speed,  it requires a stable 10 Gigabit link and a custom network driver for Linux. In comparison, it can take weeks or even months for the naive implementation of port scanners. This article describes key features behind the internal design of masscan.&lt;/p&gt;
&lt;h3&gt;What is port scanning?&lt;/h3&gt;
&lt;p&gt;Port scanning is a method to determine which ports on a specified list of IPs are open and accept connections. People use it to find web servers, proxy servers, databases, and other internet services for security research. Usually, port scanners target the &lt;a href="https://en.wikipedia.org/wiki/Internet_protocol_suite"&gt;TCP/IP&lt;/a&gt; protocol. Although, &lt;a href="https://en.wikipedia.org/wiki/User_Datagram_Protocol"&gt;UDP&lt;/a&gt; port scanning is also possible. This article will only focus on the TCP/IP (&lt;a href="https://en.wikipedia.org/wiki/IPv4"&gt;IPv4&lt;/a&gt;) running under the Linux platform since only the Linux version can handle more than two million packets per second.&lt;/p&gt;[....]</description><pubDate>Sat, 30 Apr 2022 15:06:22 -0000</pubDate><guid>https://rushter.com/blog/how-masscan-works/</guid></item><item><title>On code isolation in Python</title><link>https://rushter.com/blog/python-code-isolation/</link><description>&lt;p&gt;I started learning Python in 2009, and I had a pretty challenging task and somewhat unusual use of Python. I was working on a desktop application that used PyQT for GUI and Python as the main language.&lt;/p&gt;
&lt;p&gt;To hide the code, I embedded Python interpreter into a standalone Windows executable. There are a lot of solutions to do so (e.g.  &lt;a href="https://www.pyinstaller.org/"&gt;pyinstaller&lt;/a&gt;, &lt;a href="https://www.py2exe.org/"&gt;pyexe&lt;/a&gt;), and they all work similarly. They compile your Python scripts to &lt;a href="https://docs.python.org/3/glossary.html#term-bytecode"&gt;bytecode&lt;/a&gt; files and bundle them with an interpreter into an executable. Compiling scripts down to bytecode makes it harder for people with bad intentions to get the source code and crack or hack your software. Bytecode has to be extracted from the executable and decompiled. It can also produce obfuscated code that is much harder to understand.&lt;/p&gt;[....]</description><pubDate>Wed, 21 Oct 2020 14:55:36 -0000</pubDate><guid>https://rushter.com/blog/python-code-isolation/</guid></item><item><title>Clipboard API for browsers is inconsistent</title><link>https://rushter.com/blog/clipboard-api/</link><description>&lt;p&gt;I use clipboard a lot when I work with images or screenshots. It allows
me to upload or transfer images that are not on my disk. For example, I
can copy an image from a graphics editor to browsers, messengers, and
other applications. This approach keeps my Downloads folder clean and
sane.&lt;/p&gt;
&lt;p&gt;I always thought that when you paste an image to a browser, it's content
stays the same. It's not. As it turned out, when you copy a JPG image to
Chrome, it converts it to PNG.&lt;/p&gt;
&lt;p&gt;This is super bad for the web because your 400KB JPG can become a &lt;strong&gt;2.5MB&lt;/strong&gt;
PNG file. Many content creators paste their images to WYSIWYG editors,[....]</description><pubDate>Tue, 08 Sep 2020 12:55:32 -0000</pubDate><guid>https://rushter.com/blog/clipboard-api/</guid></item><item><title>How to turn an ordinary gzip archive into a database</title><link>https://rushter.com/blog/gzip-indexing/</link><description>&lt;p&gt;This article demonstrates how specially crafted but ordinary gzip archives can be
used as a database like storage. It also introduces a &lt;a href="https://github.com/ProfoundNetworks/gzipi"&gt;Python package&lt;/a&gt; and
explains how it works.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://en.wikipedia.org/wiki/Gzip"&gt;gzip&lt;/a&gt; is a popular file compression format to store large amounts of raw
data. It has a good data compression ratio, but relatively slow
compression/decompression speed.&lt;/p&gt;
&lt;p&gt;Many companies use it in Big data applications when they need to store
compressed CSV or JSON lines files. Such file formats are
&lt;strong&gt;row-oriented&lt;/strong&gt; and usually processed line by line. gzip can save a lot[....]</description><pubDate>Mon, 24 Aug 2020 00:11:05 -0000</pubDate><guid>https://rushter.com/blog/gzip-indexing/</guid></item><item><title>How to track and display profile views on GitHub</title><link>https://rushter.com/blog/github-profile-markdown/</link><description>&lt;p&gt;As part of recent design changes, GitHub has introduced READMEs for profiles. By &lt;a href="https://dev.to/web/design-github-profile-using-readme-md-8al"&gt;creating a repo&lt;/a&gt; with your name and adding &lt;code&gt;README.md&lt;/code&gt; file with markdown to it, you can now add a rich description of yourself.&lt;/p&gt;
&lt;p&gt;Here is an example of how it looks like:&lt;/p&gt;
&lt;p&gt;&lt;img src="https://rushter.com/static/uploads/img/github_profile.jpg" class="ui centered image" alt="Python reference count graph" width="650"&gt;&lt;/p&gt;
&lt;div class="text-center"&gt;This feature may not be available to your profile yet. GitHub uses selectively rolling.&lt;/div&gt;[....]</description><pubDate>Mon, 06 Jul 2020 20:24:38 -0000</pubDate><guid>https://rushter.com/blog/github-profile-markdown/</guid></item><item><title>Public SSH keys can leak your private infrastructure</title><link>https://rushter.com/blog/public-ssh-keys/</link><description>&lt;p&gt;This article describes a minor security flaw in the SSH authentication protocol that can lead to unexpected private infrastructure disclosure. It also provides a PoC written in Python.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://en.wikipedia.org/wiki/Public-key_cryptography"&gt;Asymmetric cryptography&lt;/a&gt;, or public-key cryptography, is the most common way to identify and authorize a user on an SSH server. It is also used to encrypt and manage access to different protocols or tools, such as  Git, SFTP, SCP, and rsync.&lt;/p&gt;
&lt;p&gt;Asymmetric cryptography uses a pair of keys: a public key and a private key. A public key can be restored from a private key, but not vice versa. It's a well-known fact that public keys meant to be public and can be widely shared. Unlike public keys, private keys must be not shared and kept in secret by their owners.&lt;/p&gt;[....]</description><pubDate>Thu, 28 Nov 2019 11:16:56 -0000</pubDate><guid>https://rushter.com/blog/public-ssh-keys/</guid></item><item><title>Detecting SQL injections in Python code using AST</title><link>https://rushter.com/blog/detecting-sql-injections-in-python/</link><description>&lt;p&gt;Python has a built-in &lt;a href="https://docs.python.org/3/library/ast.html"&gt;ast&lt;/a&gt; module that lets you inspect, parse and edit Python code. AST stands for &lt;a href="https://en.wikipedia.org/wiki/Abstract_syntax_tree"&gt;abstract syntax tree&lt;/a&gt;, a data structure that makes it easy to analyze, inspect and edit programming language code.&lt;/p&gt;
&lt;p&gt;When working with abstract trees, you don't have to worry about the syntax of a programming language. Abstract trees represent relations between objects, operators and language expressions.&lt;/p&gt;
&lt;p&gt;This article shows a real-world example of how you can use this module to detect &lt;a href="https://en.wikipedia.org/wiki/SQL_injection"&gt;SQL injection&lt;/a&gt; vulnerabilities in Python code.&lt;/p&gt;[....]</description><pubDate>Sun, 28 Apr 2019 12:13:58 -0000</pubDate><guid>https://rushter.com/blog/detecting-sql-injections-in-python/</guid></item><item><title>How Python saves memory when storing strings</title><link>https://rushter.com/blog/python-strings-and-memory/</link><description>&lt;p&gt;Since Python 3, the &lt;code&gt;str&lt;/code&gt; type uses Unicode representation. Unicode strings can take up to 4 bytes per character depending on the encoding, which sometimes can be expensive from a memory perspective. &lt;/p&gt;
&lt;p&gt;To reduce memory consumption and improve performance, Python uses three kinds of internal representations for Unicode strings:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;1 byte per char (Latin-1 encoding) &lt;/li&gt;
&lt;li&gt;2 bytes per char (UCS-2 encoding)  &lt;/li&gt;
&lt;li&gt;4 bytes per char (UCS-4 encoding) &lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;When programming in Python all strings behave the same, and most of the time we don't notice any difference. However, the difference can be very remarkable and sometimes unexpected when working with large amounts of text. &lt;/p&gt;[....]</description><pubDate>Thu, 09 Aug 2018 22:33:30 -0000</pubDate><guid>https://rushter.com/blog/python-strings-and-memory/</guid></item><item><title>How virtual environment  libraries work in Python</title><link>https://rushter.com/blog/python-virtualenv/</link><description>&lt;p&gt;Have you ever wondered what happens when you activate a virtual environment and how it works internally? Here is a quick overview of internals behind popular virtual environments, e.g., virtualenv, virtualenvwrapper, conda, pipenv.&lt;/p&gt;
&lt;p&gt;Initially, Python didn't have built-in support for virtual environments, and such feature was implemented as a hack. As it turns out, this hack is based on a simple concept.&lt;/p&gt;
&lt;p&gt;When Python starts its interpreter, it searches for the site-specific directory where all packages are stored. The search starts at the parent directory of a Python executable location and continues by backtracking the path (i.e., looking at the parent directories) until it reaches the root directory. To determine if it's a site-specific directory, Python looks for the &lt;code&gt;os.py&lt;/code&gt; module, which is a mandatory requirement by Python in order to work.&lt;/p&gt;[....]</description><pubDate>Fri, 29 Jun 2018 11:07:40 -0000</pubDate><guid>https://rushter.com/blog/python-virtualenv/</guid></item></channel></rss>