$ touch file.txt
$ ln file.txt another-name-for-a-file.txt
$ echo Hello > file.txt
$ cat another-name-for-a-file.txt
Hello

So, these things are great if you want one file to appear in more than one place. At least once I thought that a hardlink can allow two processes running under different users to access and modify one file even if each file has a pretty strict access mode. I could not be more wrong.

A directory contains filename to inode mapping for a file. And a hard link is just another filename for the same inode. The directory entry does not have the ownership information and access mode:

$ ls -l file.txt another-name-for-a-file.txt
-rw-rw-r-- 2 rtg rtg 6 Feb 23 16:34 another-name-for-a-file.txt
-rw-rw-r-- 2 rtg rtg 6 Feb 23 16:34 file.txt

$ chmod 0600 another-name-for-a-file.txt
$ ls -l file.txt another-name-for-a-file.txt
-rw------- 2 rtg rtg 6 Feb 23 16:34 another-name-for-a-file.txt
-rw------- 2 rtg rtg 6 Feb 23 16:34 file.txt

Yes, that information is stored in the inode itself and it is simply shared between all the names of the file.

Hard links cannot cross the filesystem boundaries because they are simply referencing an existing inode number and not the path.

You can’t create a hard link to a directory because it can introduce an infinite loop while traversing the directories. You can still do this with symbolic links but the system utilities will handle this for you, because it can be clearly seen what file name is the canonical one:

$ mkdir /tmp/a
$ ln -s /tmp/a /tmp/a/b
$ ls -lHR /tmp/a
/tmp/a:
total 0
lrwxrwxrwx 1 rtg rtg 6 Feb 23 16:47 b -> /tmp/a
$ find -L /tmp/a
/tmp/a
find: File system loop detected; `/tmp/a/b' is part of the same file system loop as `/tmp/a'.

At some point I understood that I have no idea why hard links would be useful, however, as a nice blog post by Paul Cobbaut and the comments suggest, hard links are used during file renaming across a single file system. At first a new link to the same inode is created, then an old one is removed.

Good to know

sil

2013-02-23 at 17:21

They’re useful in the case of mostly-duplicated data, too. Backups are a good example of this. If you’ve got a folder things/ with thing1, thing2, thing3 in it, and you create a backup of it on your backup server, you’ll have:

backupserver:
/backup-2013-02-23
    /things
    thing1
    thing2
    thing3

then you delete thing2 from your disk, and back up again. In theory, then, you’d get a new folder on the backup server containing thing1 and thing3 — so the backup server would have two copies of thing1 and thing3, which is wasteful of space because they havn’t changed. However, in practice, you get this:

backupserver:
/backup-2013-02-23
    /things
    thing1
    thing2
    thing3
/backup-2013-02-24
    /things
    thing1 (hardlink to thing1 in previous folder)
    thing3 (hardlink to thing3 in previous folder)

so there’s only actually one copy of thing1, even though it’s in both backup folders. This means that every backup folder looks like a full backup, but only actually takes the space of an incremental backup.

Stack Overflow Micro-HOWTO

Roman Yepishev

2013-02-14 16:19

You may have heard about stack overflow (no, not the web site), but you may nave never had a chance to experience what that really is.

In Linux you can control the stack size with "ulimit -s". By default it is 8 MB on Ubuntu machine:

$ ulimit -s
8192

The program below causes a stack overflow. Please note that the application does nothing, however it manages to fill its stack space completely.

int main(int argc, char** argv) {
    char stack[8192 * 1024];
    return 0;
}

$ gcc -o stack stack.c
$ ./stack
Segmentation fault (core dumped)

Even though 8Mb is available to the program, there are various other things that need to be put on the stack, such as the arguments and return values. When a recursive function breaks and calls itself indefinitely it eventually uses up all the stack space and crashes in exactly the same way.

Read more about this in the C++ Tutorial.

Android 2.2 with Classless Static Routes

Roman Yepishev

2013-01-30 13:08

I have a second network at home for my virtual machines and DHCP is set up to give the classless routing information. I usually use 3G connection but today I enabled WiFi on my android phone running 2.2 and it brought the WiFi up but no hosts could be accessed. It turned out that it did not want to set the default route.

It turns out that Android actually implements it the right way:

If the DHCP server returns both a Classless Static Routes option and a Router option, the DHCP client MUST ignore the Router option.

—RFC 3442 – The Classless Static Route Option for Dynamic Host Configuration Protocol (DHCP) version 4

Oh, so my network is broken! I added the default route to the classless static route (which immediately triggered a bug in network manager, which is not that critical – the gateway is still picked up from the router option) but now my phone failed to get the DNS.

After forcing dhcp option 6 with the ip address of my DNS server the phone finally connected to the outside world via WiFi.

So now my dnsmasq uci config started to look like this and it works for me:

config 'dhcp' 'lan'
    ...
    # option 121
    list 'dhcp_option' 'option:classless-static-route,192.168.100.0/24,192.168.1.10,0.0.0.0/0,192.168.1.1'
    # option 6
    list 'dhcp_option' 'option:dns-server,192.168.1.1'

Getting rid of bash annoyances

Roman Yepishev

2013-01-09 12:28

$ zcat /boot/initrd.<tab><tab><TAB!>

I know the file is there and I know it’s a file an application can handle, but bash autocompletion tries to be smart by refusing to provide the file name.

This gets really annoying if you happen to work with files that don’t have an extension the autocompletion scripts expect or you are running a command using sudo and want to pass the filename.

There is a dedicated shortcut for filename completion – M-/ by default, but I’ve never used tab completion for options and that means that I can get rid of it.

There is no need to uninstall anything, adding complete -r to .bashrc will remove all the completion functions.

To completely disable programmatic completion, add shopt -u progcomp to .bashrc.

$ zcat /boot/initrd.img-3.<tab>
initrd.img-3.2.0-29-generic         initrd.img-3.7.0-7-generic
initrd.img-3.2.0-35-generic         initrd.img-3.8.0-030800rc2-generic

Nice.

Another thing that was annoying me for a while is command-not-found package. It is extremely helpful at discovering what package the application I want is in. However, there were quite a few times when I made a typo, pressed Enter, noticed it right away but had to wait for a second or so before I got the “command not found” and the prompt back. When disk is really busy, making a typo costs me another 30 seconds of disk trashing before command-not-found comes up with a friendly suggestion that killal is better spelled as Command 'killall' from package 'psmisc'.

Having command-not-found installed and available but not kicking in on every occasion is preferred. The function bash runs in case it can’t find the command is command_not_found_handle so we simply need to unset this function in .bashrc and add an alias (in my case packages-providing after LP:486716, but it can be anything) which will execute the real command-not-found script:

unset command_not_found_handle
alias packages-providing='/usr/lib/command-not-found --no-failure-msg'

So now I will be given the package name only when I want it:

$ sl
bash: sl: command not found

$ packages-providing sl
The program 'sl' is currently not installed. You can install it by typing:
sudo apt-get install sl

Pete Lacey’s Weblog : The S stands for Simple:

Roman Yepishev

2012-12-23 10:52

Every time I hear about SOAP being used for something I immediately want to direct the speaker to this blog post:

The S stands for simple

r8169 module with DKMS in Ubuntu 12.04 LTS

Roman Yepishev

2012-12-18 13:04

When testing network issues with different cards it is a good idea to make sure they actually have different chipsets.

Update: I found no difference between the NIC operating with built-in r8169 module and the version that is available on the RealTek’s web site. You will want to use the vendor-provided module only if your card is not supported by the driver shipped with the linux kernel.

Update: I packaged the source-only dkms – r8169-6.017.00-source-only.dkms.tar.gz, which can be installed with:

sudo dkms ldtarball r8169-6.017.00-source-only.dkms.tar.gz
sudo dkms build r8169/6.017.00
sudo dkms install r8169/6.017.00

Update: See the end of the post, patch is required for DGE-528T to be picked by this module.

I had a complex VM network setup with separate vlan on the router for virtual machines and second network card in the server to prevent connection hanging when network topology was changing on the VM startup. Then I decided to simplify the network and use one NIC only, in my case that would be DLink DGE-528T.

This card is driven by realtek 8169 chip and for some reason these like to drop network connection. I’ve found quite a few topics on poor performance of 8169 with in-kernel driver, but the built-in r8168 chip running under r8169 driver included in the kernel performs way worse than r8169.

There is a driver update on the Realtek web site so I decided to give it a try.

I started with dkms.conf from Fixing RTL8111/8168B kernel module on Debian/Ubuntu post by Randy, but for some reason r8169 module kept overwriting the system wide one. It turns out that Makefile installs module automatically when called with no specific target.

So here’s how I made it work:

Download the tarball from Realtek web site.
Unpack the resulting r8169-6.017.00.tar.bz2 to /usr/src.

Add the dkms.conf file to /usr/src/r8169-6.017.00/:

PACKAGE_NAME="r8169"
PACKAGE_VERSION="6.017.00"
CLEAN="make clean"
BUILT_MODULE_NAME[0]="r8169"
BUILT_MODULE_LOCATION[0]="src"
DEST_MODULE_LOCATION[0]="/updates"
MAKE[0]="'make' 'modules'"
AUTOINSTALL="YES"

Build and install the module:

$ sudo dkms install r8169/6.017.00
...
 - Installation
   - Installing to /lib/modules/3.2.0-35-generic/updates/dkms/
...

At the moment due to the bug in dkms script it is not possible to create the dkms tarball directly from realtek sources but you can create one after you perform the steps above.

$ sudo dkms mktarball r8169/6.017.00
Marking modules for 3.2.0-35-generic (x86_64) for archiving...

Marking /var/lib/dkms/r8169/6.017.00/source for archiving...

Tarball location: /var/lib/dkms/r8169/6.017.00/tarball//r8169-6.017.00-kernel3.2.0-35-generic-x86_64.dkms.tar.gz

Afterwards you can load this tarbal on a different machine:

$ sudo dkms ldtarball r8169-6.017.00-kernel3.2.0-35-generic-x86_64.dkms.tar.gz

Loading tarball for r8169-6.017.00
Loading /var/lib/dkms/r8169/6.017.00/3.2.0-35-generic/x86_64...

DKMS: ldtarball completed.

Creating symlink /var/lib/dkms/r8169/6.017.00/source ->
                /usr/src/r8169-6.017.00

DKMS: add completed.

$ sudo dkms install r8169/6.017.00

Creating symlink /var/lib/dkms/r8169/6.017.00/source ->
                /usr/src/r8169-6.017.00

DKMS: add completed.

Kernel preparation unnecessary for this kernel.  Skipping...

Building module:
cleaning build area....
'make' 'modules'....
cleaning build area....

DKMS: build completed.

r8169.ko:
Running module version sanity check.
- Original module
- Installation
- Installing to /lib/modules/3.2.0-35-generic/updates/dkms/

depmod....

DKMS: install completed.

DGE-528T

So I left this module running for about a week and yesterday I could not make the card work after a reboot. It turns out that while the updated r8169 module is installed, it is not being used for D-Link DGE-528T card.

r8169_n.c contains the following:

static struct pci_device_id rtl8169_pci_tbl[] = {
        { PCI_DEVICE(PCI_VENDOR_ID_REALTEK,     0x8167), 0, 0, RTL_CFG_0 },
        { PCI_DEVICE(PCI_VENDOR_ID_REALTEK,     0x8169), 0, 0, RTL_CFG_0 },
        { PCI_VENDOR_ID_DLINK, 0x4300, PCI_VENDOR_ID_DLINK, 0x4c00, 0, 0, RTL_CFG_0 },
        {0,},
};

So the module does not match PCI_VENDOR_ID_DLINK:0x4c00 subsystem (lspci -vnn):

02:06.0 Ethernet controller [0200]: D-Link System Inc DGE-528T Gigabit Ethernet Adapter [1186:4300] (rev 10)
    Subsystem: D-Link System Inc DGE-528T Gigabit Ethernet Adapter [1186:4300]
    Flags: bus master, 66MHz, medium devsel, latency 64, IRQ 21
    I/O ports at e800 [size=256]
    Memory at febffc00 (32-bit, non-prefetchable) [size=256]
    Expansion ROM at febc0000 [disabled] [size=128K]
    Capabilities: [dc] Power Management version 2
    Kernel driver in use: r8169
    Kernel modules: r8169

1186:4c00 is actually a DGE-T530 card, but it looks like there are some DGE-T528 cards with 1186:4300 subsystem and some have 1186:4c00 one. The original driver from D-Link has PCI_ANY_ID for subsystem fields so it looks like these are compatible.

I was so sure the updated module would fix the issues I was having with the network card that it actually stopped misbehaving.

However, I decided to patch the module to make it work with my card too:

--- r8169_n.c.ori   2012-05-03 15:23:12.000000000 +0300
+++ r8169_n.c   2012-12-30 19:42:14.818322918 +0200
@@ -115,7 +115,8 @@
static struct pci_device_id rtl8169_pci_tbl[] = {
    { PCI_DEVICE(PCI_VENDOR_ID_REALTEK, 0x8167), 0, 0, RTL_CFG_0 },
    { PCI_DEVICE(PCI_VENDOR_ID_REALTEK, 0x8169), 0, 0, RTL_CFG_0 },
-   { PCI_VENDOR_ID_DLINK, 0x4300, PCI_VENDOR_ID_DLINK, 0x4c00, 0, 0, RTL_CFG_0 },
+   { PCI_DEVICE(PCI_VENDOR_ID_DLINK,       0x4300), 0, 0, RTL_CFG_0 },
+   //{ PCI_VENDOR_ID_DLINK, 0x4300, PCI_VENDOR_ID_DLINK, 0x4c00, 0, 0, RTL_CFG_0 },
    {0,},
};

And now the module prints the following when loaded:

[18014.031288] r8169 Gigabit Ethernet driver 6.017.00-NAPI loaded
[18014.031344] r8169 0000:02:06.0: PCI INT A -> GSI 21 (level, low) -> IRQ 21
[18014.033537] r8169: This product is covered by one or more of the following patents: US5,307,459, US5,434,872, US5,732,094, US6,570,884, US6,115,776, and US6,327,625.

So now I am really testing the module provided by Realtek.

Хостинг WSGI з nic.ua

Roman Yepishev

2012-12-07 15:06

26-го листопада я отримав повідомлення від nic.ua, в якому йшла мова про можливість замовлення будь-яких послуг за половину вартості. nic.ua є реєстратором доменних імен, а після об’єднання з hosted.ua також надає послуги хостингу.

Я вже давно шукав місце для свого локального django проекту, тому вирішив взяти в оренду хостинг з тарифом NIC•1. Про те, що Python можна використовувати в режимі WSGI, я спочатку прочитав у їхній базі знань, а потім власноручно перевірив модуль mod_wsgi на сервері, що обслуговує цей блог.

Хостинг працює на базі cPanel/WHM. Донедавна я користувався тільки VPS-рішеннями і все звик налаштовувати самостійно, тому з cPanel я зустрівся вперше саме при переїзді блогу.

В ідеалі встановлення модулів Python на сервер здійснюється через віддалене з’єднання SSH. nic.ua надає доступ по SSH, про це заявлено у “додаткових послугах”. Я поцікавився умовами в службі підтримки:

SSH доступ надається в ручному режимі. Необхідні:

Скан-копія вашого паспорту (перші 2 розвороти)

Ваша статична IP адреса, з якої ви працюєте з хостингом.

Логін хостингу.

Поповнення особового рахунку на 40₴. та повідомлення номера сплаченого рахунку. Ці гроші буде списано бухгалтерією.

Послуга надається 1 раз на весь час дії хостингу.

Я вирішив для початку відтворити віддалену систему вдома на віртуальній машині, зібрати всі необхідні модулі та замовляти SSH доступ тільки якщо зібрані бінарні модулі відмовляться працювати. Все ПЗ в автоматичному режимі встановлює cPanel, проте я не бажав поки що мати з нею справу на своєму сервері. До того ж, cPanel потребує ліцензії.

Отже, на призначеному мені сервері було встановлено:

CentOS 6.3 x86_64
Apache 2.2.23 (стандартно в CentOS 6.3, але перезібраний в /usr/local)
Python 2.7.3, встановлений до /opt (в /usr живе python 2.6.6)
setuptools і virtualenv вже встановлені
mod_wsgi 3.3, зібраний з python 2.7.3
nginx 1.2.4, що проксює всі запити до Apache та віддає статичні файли
PostgreSQL 8.4.13 (в CentOS 6.3)

Після інсталяції CentOS 6.3 у мінімальній конфігурації я додав необхідні пакети:

yum groupinstall "Development tools"
yum install httpd-devel
yum install postgresql-devel

Для початку я зібрав Python (список пакетів для встановлення взяв з How to install Python 2.7.3 on CentOS 6.2, тому що з системами RPM не працював з 2006 року). При конфігуруванні треба вказати:

./configure --prefix=/opt/python2.7 --with-threads --enable-shared

Потім зібрав mod_wsgi, вказавши шлях до python в /opt/python2.7/bin/python. При цьому в /etc/httpd/conf.d/mod_wsgi.conf додав шлях, куди треба складати сокети:

LoadModule wsgi_module modules/mod_wsgi.so
WSGISocketPrefix run/

Далі створив віртуальний хост для test.staging.lappyfamily.net, в інсталяцію нового Python додав setuptools, через easy_install встановив virtualenv.

На сервері з повним доступом модулі можна встановлювати в системні директорії, однак, якщо це не є можливим, то можна використати virtualenv.

Подальші дії вже проводив від імені звичайного користувача.

Створив новий virtualenv у своєму каталозі та, активувавши нове оточення, додав pysocpg2 та django через easy_install. Після цього створив новий простий django проект.

Для того, щоб проект зміг знайти модулі, що були встановлені у virtualenv, в WSGI скрипті вказав наступне:

import site
site.addsitedir('/home/lappyfam/virtualenv/lib/python2.7/site-packages')
site.removeduppaths()

lappyfam – це ім’я мого користувача.

Після того, як мій додаток був готовий та коректно працював локально, я заархівував створені директорії в tar.gz (тому що у віртуальному середовищі є символічні посилання, які не можна створити через FTP) і розпакував його через файловий менеджер у веб-інтерфейсі cPanel.

Проект запрацював.

Активація

У стандартній конфігурації WSGI працює під тим самим обліковим записом, що і apache. У статті про те, як використовувати django, сказано, що для активації проекту потрібно подати заявку до служби підтримки.

Активацією в nic.ua називається додавання наступних рядків до конфігурації віртуального хосту:

GIScriptAlias / /path/to/wsgi.py
WSGIDaemonProcess $PROJECT_NAME user=$USER group=$GROUP processes=2 threads=1 display-name=%{GROUP}
WSGIProcessGroup $PROJECT_NAME
WSGIApplicationGroup %{GLOBAL}

Після цього процес WSGI буде працювати через обліковий запис користувача. Хочу наголосити, що додавати Alias для статичних файлів не потрібно, запит спочатку обробляє nginx, який самостійно перевіряє DOCUMENT_ROOT на наявність потрібних файлів.

Якщо використовується WSGIScriptAlias в кореневій директорії, то .htaccess буде ігноруватися. Служба підтримки пішла назустріч та прибрала рядок WSGIScriptAlias з конфігурації. В такому разі wsgi скрипт має знаходитися в DOCUMENT_ROOT, а в .htaccess буде щось на кшталт:

Options +ExecCGI
RewriteEngine On
AddHandler wsgi-script .wsgi

# Prefer django.wsgi not to be visible
RewriteRule ^django.wsgi$ / [R,L]

RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^(.*)$ /django.wsgi/$1

Basic/Digest Auth

Якщо використовувати wsgi через RewriteRule без WSGIScriptAlias, то django не буде коректно працювати з Basic або Digest Auth, тому що RemoteUserMiddleware не буде отримувати змінну REMOTE_USER (якщо авторизація була, наприклад, у папці /api, а сам WSGI скрипт розміщено у корені). При цьому буде передаватися змінна REDIRECT_REMOTE_USER, яку django не використовує.

Для того, щоб django використовував нову змінну, потрібно створити нове Middleware:

from django.contrib.auth.middleware import RemoteUserMiddleware

class RedirectRemoteUserMiddleware(RemoteUserMiddleware):
    header = 'REDIRECT_REMOTE_USER'

База даних

Для того, щоб мати змогу робити резервну копію бд локально та проводити міграції структури напряму через django-admin, я надіслав до служби підтримки запит з обґрунтуванням потреби доступу до бази даних, і через 4 години доступ було надано. Ця послуга є безкоштовною, але клієнтові необхідно мати статичну адресу. Тут є стаття про MySQL, але для PostgreSQL умови не відрізняються.

Якщо PostgreSQL відмовляється надавати доступ через localhost, потрібно використовувати адресу 127.0.0.1, тому що localhost веде до IPv6 адреси ::1. Я повідомив службу підтримки про цю ситуацію, і протягом декількох хвилин необхідні зміни конфігурації вже було внесено.

Пошта

Локальний сервер відмовляється надсилати пошту, якщо не існує локального користувача, що вказується як адресант. Для того, щоб мій додаток зміг надіслати листа, потрібно створити локальну поштову скриньку або переадресовувати пошту на іншу.

Якщо цього не зробити, то віддалений сервер (наприклад GMail або Яндекс) не зможе підтвердити наявність облікового запису та відповість:

{'recipient@example.net':
(550, 'Verification failed for \nNo Such User Here"\nSender verify failed')}

DKIM

В cPanel можна увімкнути підтримку DKIM. Незважаючи на те, що текст у довідці натякає тільки на підтримку перевірки вхідної кореспонденції, це налаштування керує підписуванням вихідних листів. Для того, щоб необхідні заголовки DKIM були додані до листів, останні потрібно надсилати через локальний SMTP сервер. Використання sendmail напряму не підходить (це стосується тільки сторонніх скриптів, django не використовує sendmail).

Приватний та публічний ключі створюються автоматично і їх не можна змінити або побачити через cPanel. Оскільки мій домен зареєстрований і підтримується у GoDaddy, то мені потрібно було лише видобути публічний ключ:

$ dig TXT default._domainkey.lappyfamily.net @tzk105.nic.ua
...
;; ANSWER SECTION:
default._domainkey.lappyfamily.net. 14400 IN TXT "v=DKIM1\; k=rsa\; p=MHwwDQYJKoZIhvcNAQEBBQADawAwaAJhAOOL77/ce8tGGPpMOOEV/3RQyDza+O9onYOboDZ2SQPbOvWZvA0k8GCnZm/5ZsvR8VOFrimJ/jYYQLJfi+cCJDlqd1Yl7rMP1Xo9t+W55rvjKt9UYo6Ean05h1K6qd6B3QIDAQAB\;"

Про те, який використовується селектор (default), я дізнався з надісланого листа -

...
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lappyfamily.net; s=default;
...

cPanel може не подобатися те, що сервер не є головним DNS для цієї зони, але це повідомлення можна спокійно проігнорувати:

Після того, як я додав дані DKIM до свого DNS, cPanel скаржитися припинила.

На даний момент окрему IP адресу можна замовити тільки через службу підтримки і коштує вона 25₴ на весь період дії хостингу. При цьому окрема IPv6 адреса не надається через відсутність підтримки у cPanel. В майбутньому остання має почати коректно працювати з новими адресами.

nic.ua працює над тим, щоб додати можливість замовлення цієї послуги через “персональний кабінет”, але мене попередили, що після цього вартість може змінитися.

Через обмеження cPanel для одного облікового запису можливо призначити тільки одну IP адресу і SSL сертифікат можна встановити тільки для одного домену. cPanel не підтримує SNI, тому використовувати SSL для більш ніж одного домену (або субдомену) не є можливим.

Це також має змінитися в майбутньому – [Case 46856] SNI ( Server Name Indicator ), SSL support in cPanel.

Особливості mod_security

На серверах встановлено модуль mod_security. Для того, щоб до сайтів не зверталися погано написані боти, mod_security фільтрує запити за заголовком User-Agent. Стандартні User-Agent для curl та lwp-request на будь-який запит будуть отримувати у відповідь 404 Not found, тому, якщо програмне забезпечення має працювати з сервісом на цьому хостингу, потрібно використовувати власні унікальні заголовки User-Agent.

Завантаження файлів

Стандартно надається доступ через FTP, підтримується SSL, але сертифікат самопідписаний. Для того, щоб змусити lftp прийняти цей SSL сертифікат, треба встановити set ssl:verify-certificate false.

Також cPanel має свій власний WebDAV сервер, але він не дозволяє змінювати режим доступу до файлів (chmod).

Результат

Мій проект успішно переїхав до nic.ua, і тепер можна спокійно вимикати домашній сервер.

В той час, коли прогресивне людство використовує VPS та створює віртуальні машини на Amazon, такий тип хостингу продовжує існувати. Для більшості простих проектів він підходить, не потрібно займатися апаратною частиною власноруч, перейматися базою даних та тим, що на дешевій VPS можна вийти за межі доступної оперативної пам’яті.

Однак на хостингу такого типу можна розміщувати тільки проекти, що не використовують програми для фонової обробки даних (наприклад, celery).

Також хостинг не підійде для проектів, що використовують більше одного домену з SSL.

Дякую співробітнику служби технічної підтримки nic.ua Дмитру Хветкевичу за допомогу в налаштуванні та вирішенні питань.

When localhost is not

Roman Yepishev

2012-12-06 15:07

/etc/hosts is a file where these entries should never be touched:

127.0.0.1       localhost
::1             ip6-localhost ip6-loopback

However, during my last trip to my VPS to fix my mail system after opendkim update in Ubuntu 10.04 I found something interesting in netstat:

$ sudo netstat -lntp
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address        ... PID/Program name
...
tcp        0      0 173.212.238.58:8891  ... 15410/opendkim

Basically, when I specified localhost in opendkim configuration, it was listening on a public interface instead. Pinging localhost revealed it is actually a non-loopback address:

$ ping localhost
PING yankee.lappyfamily.net (173.212.238.58) 56(84) bytes of data.
...

It looks like it’s been this way since the very begninning, as my /etc/hosts had the following:

# Auto-generated hostname. Please do not remove this comment.
173.212.238.58 yankee.lappyfamily.net  yankee localhost 204538 localhost.localdomain

And this was clearly a misconfiguration (I am sure 204538 is a good hostname).

I looked at my local Ubuntu installation and updated the VPS so that the hosts file became:

127.0.0.1 localhost
::1 ip6-localhost ip6-loopback
# Auto-generated hostname. Please do not remove this comment.
173.212.238.58 yankee.lappyfamily.net  yankee

After this I restarted all the applications that were supposed to listen on the loopback interface and verified the fix with netstat again.

First of all, you need to have a firewall configured on your servers and allow only trusted incoming connections to trusted applications. This is what prevented my opendkim installation from accepting the incoming requests from the internet.

Second, you need to verify that localhost actually refers to the loopback interface and does not resolve to your public one, as you have a fully qualified name for that purpose.

I found that now the control panel for the VPS I am using correctly generates the hostname line, but it may not have been the case a year ago when I got the VPS first configured.

test data for unittest.TestCase

Roman Yepishev

2012-12-05 18:01

Trivial but useful.

Recently I needed to test the behavior of the function that fetched some remote resource. I wanted to control how it works and supply my own cached version stored as a file.

While I originally thought unittest should support this using some sort of a method to get testdata directory, it is actually quite easy to implement. You only need to create some folder (I called it “testdata”) in the “tests” directory and then you can refer to it using plain old reference to __file__:

import os
...
testdata_dir = os.path.dirname(__file__)
testfile = os.path.join(testdata_dir, 'somefile.xml')

It took a while to understand that I may simply use __file__ and don’t bother creating a HTTP server.

Disclaimer: No wheel was reinvented during these tests.

Importing disqus comments into WordPress

Roman Yepishev

2012-12-04 21:43

Long story short: this is not trivial.

Disqus is a popular service that provides commenting functionality. All you need to do for your HTML page to have embedded discussion is to add a bit of javascript.

They have awesome importers from various blog engines. This blog was originally hosted on Blogspot, migration to static blog generated by Octopress required me to search for alternative commenting facilities and I decided to use disqus (is there anything else, really?) Now I switched to WordPress and I wanted my comments back.

Disqus does provide the ability to export all the comments in a XML file. Quick Google search told me that it was possible to get the comments quite easily into WordPress by converting the disqus dump into WXR, but it is not really that easy. WordPress will ignore the post if it is already there and will not import comments. There’s a plugin that imports disqus comments, but I wanted full support of nested comments and little to no PHP coding.

I dumped all the wordpress database locally and hacked up a script that reads the comments.xml file and puts the necessary data into the database. The script needs access to some sort of a database in order to figure out the post_id for each post and generate the comment identifiers for correct nesting.

#!/usr/bin/python """Quick and dirty hack to get disqus comments into WordPress DB""" import sys from datetime import datetime from xml.etree import ElementTree as ET import MySQLdb ANONYMOUS_EMAIL = 'nobody@example.net' # Database configuration DATABASE = { 'host': 'lab.lappyfamily.net', 'user': 'rtg', 'name': 'rtginua6_wp1', } # Admin information ADMIN_INFO = { 'comment_author': 'Roman Yepishev', 'comment_author_email': 'roman.yepishev@yandex.ua', 'user_id': 1 } # Names used in disqus that represent administrator ADMIN_ALIASES = set(['rtg', 'Roman', 'rye']) class DisqusImporter(object): """ Imports Disqus XML into MySQL database for WordPress """ def __init__(self): self.wpdb = MySQLdb.connect(host=DATABASE['host'], user=DATABASE['user'], db=DATABASE['name'], charset='utf8') self.wp_post_url_to_id = {} def make_wordpress_url_map(self): """Creates URL->ID map for WordPress URLS""" cursor = self.wpdb.cursor() cursor.execute(""" SELECT guid, id FROM wp_posts WHERE post_type = 'post' """) for row in cursor: self.wp_post_url_to_id[row[0]] = row[1] def parse_disqus_comments(self, path): """Parse comments creating WP-like structure""" NS = '{http://disqus.com}' NS_DI = '{http://disqus.com/disqus-internals}' tree = ET.parse(path) root = tree.getroot() comments = {} thread_id_to_url = {} # Gathering post threads identifiers. # Each thread corresponds to a blog post for thread in root.findall(NS + 'thread'): dsq_id = thread.attrib[NS_DI + 'id'] link = thread.find(NS + 'link') thread_id_to_url[dsq_id] = link.text # Parsing posts for post in root.findall(NS + 'post'): dsq_id = post.attrib[NS_DI + 'id'] thread_id = post.find(NS + 'thread').attrib[NS_DI + 'id'] # If we don't have the mapping from post to # WordPress post ID, we can't # proceed with this comment thread_url = thread_id_to_url[thread_id] if thread_url not in self.wp_post_url_to_id: print "Skipping comment for {}".format(thread_url) continue created_at = post.find(NS + 'createdAt').text author = post.find(NS + 'author') email = author.find(NS + 'email').text name = author.find(NS + 'name').text parent = post.find(NS + 'parent') if parent is not None: parent_id = parent.attrib[NS_DI + 'id'] else: parent_id = None # MySQL issues a warning if we stuff data in YYY-mm-ddTHH:MM:SSZ comment_date = datetime.strptime(created_at, '%Y-%m-%dT%H:%M:%SZ' ).strftime('%Y-%m-%d %H:%M:%S') comment_post_id = self.wp_post_url_to_id[thread_url] post_data = { 'comment_post_ID': comment_post_id, 'comment_content': post.find(NS + 'message').text, 'comment_date': comment_date, 'comment_date_gmt': comment_date, 'comment_author': name, 'comment_author_IP': post.find(NS + 'ipAddress').text, 'comment_author_email': email if email else ANONYMOUS_EMAIL, 'user_id': 0, 'parent_id': parent_id, 'children': [] } # Fixup for my own comments if name in ADMIN_ALIASES: post_data.update(ADMIN_INFO) comments[dsq_id] = post_data # First pass - creating comment tree for comment in comments.values(): if comment['parent_id']: parent_comment = comments[comment['parent_id']] parent_comment['children'].append(comment) # Second pass - dropping posts that are not toplevel # They are already in 'children' for comment_id in comments.keys(): # If it is still here (we could have deleted it) if comment_id in comments: comment = comments[comment_id] else: continue if comment['parent_id']: del comments[comment_id] return comments def add_comment(self, comment, parent_id): """Add comment and all the child comments to the DB""" cursor = self.wpdb.cursor() comment['comment_parent'] = parent_id cursor.execute(""" INSERT INTO wp_comments ( comment_post_ID, comment_author, comment_author_email, comment_author_IP, comment_date, comment_date_gmt, comment_content, comment_parent, user_id) VALUES ( %(comment_post_ID)s, %(comment_author)s, %(comment_author_email)s, %(comment_author_IP)s, %(comment_date)s, %(comment_date_gmt)s, %(comment_content)s, %(comment_parent)s, %(user_id)s) """, comment ) parent_id = cursor.lastrowid for item in comment['children']: self.add_comment(item, parent_id) def update_comment_count(self): """ Synchronizes cached comment count with the actual number of comments """ cursor = self.wpdb.cursor() # Update post counts cursor.execute(""" UPDATE wp_posts AS p LEFT JOIN ( SELECT comment_post_ID, count(comment_post_ID) as comment_count FROM wp_comments WHERE comment_approved = '1' GROUP BY comment_post_ID ) as c ON p.id = c.comment_post_ID SET p.comment_count = c.comment_count WHERE p.id = c.comment_post_ID """) def main(self, path): """Entry point""" self.make_wordpress_url_map() comments = self.parse_disqus_comments(path) for item in comments.values(): self.add_comment(item, 0) self.update_comment_count() if __name__ == "__main__": importer = DisqusImporter() importer.main(sys.argv[1])

...
ANONYMOUS_EMAIL = 'nobody@example.net'

# Database configuration
DATABASE = {
    'host': 'lab.lappyfamily.net',
    'user': 'rtg',
    'name': 'rtginua6_wp1',
}

# Admin information
ADMIN_INFO = {
    'comment_author': 'Roman Yepishev',
    'comment_author_email': 'roman.yepishev@yandex.ua',
    'user_id': 1
}

# Names used in disqus that represent administrator
ADMIN_ALIASES = set(['rtg', 'Roman', 'rye'])
...

And the script itself runs on an uncompressed disqus XML. The script will skip all pages that it could not find the URLs for:

$ python wp-import-disqus.py comments.xml
Skipping comment for http://rtg.in.ua/app/acer-exif-fixup/index.html

After script finished I dumped the comments table and uploaded it via the phpmyadmin interface.

$ mysqldump -h lab rtginua6_wp1 wp_comments > wp_comments.sql

This resulted in all 140 comments being correctly imported in a correct encoding and properly nested.

If you happen to know a better way to import comments, feel free to provide the links to alternative solutions.