SCHNUCKI project analysis

schnucki {at} 0x11.net

2007 · 01 · 20

General statistics

servermails
cccc38201
avalon2601

Number of received mails per day

day 0: project start

Crawler intelligence

total
tagmailsdescription
ld10791non-crippled mailaddress, linked, with description
li 8461non-crippled mailaddress, linked, without description
no 6748non-crippled mailaddress, not linked
rc 6273additional string REMOVETHIS in local part
ra 5630additional string NOSPAM in local part
hc 2087several html comments in mailaddress
rb 757additional string NOSPAM in domain
hd 30some chars replaced by html entities
ha 17several html comments in mailaddress, not linked
hb 7some chars replaced by html entities, not linked
sh 1"@" replaced by "at"
sg 0"@" replaced by "at", not linked
sf 0"@" replaced by "#"
se 0"." and "@" replaced by "(dot)" and "(at)"
sd 0"." and "@" replaced by "dot" and "at"
sc 0not linked, "@" replaced by "#"
sb 0not linked, "." and "@" replaced by "(dot)" and "(at)"
sa 0not linked, "." and "@" replaced by "dot" and "at"
rd 0additional string REMOVETHIS in domain
total40802

It seems to be effective, replacing "@" by "at" or some other string. Do not insert html comments into your mail address. I suppose the crawlers even speed up their proccesses by replacing html comments in the whole page before parsing it.
Note: the crawlers tried to remove REMOVETHIS and NOSPAM, but they mostly didn't succeed as you can see below.

valid
tagmailsvalid percentdescription
ld974090.2%non-crippled mailaddress, linked, with description
li745788.1%non-crippled mailaddress, linked, without description
no454267.3%non-crippled mailaddress, not linked
hc206598.9%several html comments in mailaddress
rb 757100.0%additional string NOSPAM in domain
ra 71512.7%additional string NOSPAM in local part
hd 30100.0%some chars replaced by html entities
ha 17100.0%several html comments in mailaddress, not linked
hb 7100.0%some chars replaced by html entities, not linked
sh 1100.0%"@" replaced by "at"
sg 0-"@" replaced by "at", not linked
sf 0-"@" replaced by "#"
se 0-"." and "@" replaced by "(dot)" and "(at)"
sd 0-"." and "@" replaced by "dot" and "at"
sc 0-not linked, "@" replaced by "#"
sb 0-not linked, "." and "@" replaced by "(dot)" and "(at)"
sa 0-not linked, "." and "@" replaced by "dot" and "at"
rd 0-additional string REMOVETHIS in domain
rc 0-additional string REMOVETHIS in local part
total2533162.1%
tag and site only
(eg. rc-cccc@schnucki.koeln.ccc.de)
tagmailsdescription
rc2016additional string REMOVETHIS in local part
no1985non-crippled mailaddress, not linked
ra1132additional string NOSPAM in local part
ld1048non-crippled mailaddress, linked, with description
li1003non-crippled mailaddress, linked, without description
IP address too short (<8 chars)
tagmailsdescription
hc22several html comments in mailaddress
ld 3non-crippled mailaddress, linked, with description
no 1non-crippled mailaddress, not linked
li 1non-crippled mailaddress, linked, without description
no IP address
tagmailsdescription
ra1328additional string NOSPAM in local part
rc1299additional string REMOVETHIS in local part
NOSPAM or REMOVETHIS not removed
tagmailsdescription
rc2825additional string REMOVETHIS in local part
ra2449additional string NOSPAM in local part
NOSPAM / REMOVETHIS not removed and malformed IP
tagmailsdescription
rc133additional string REMOVETHIS in local part
ra 2additional string NOSPAM in local part
malformed timestamp and malformed IP
tagmailsdescription
ra4additional string NOSPAM in local part
addresses beginning with "example"
tagmailsdescription
no220non-crippled mailaddress, not linked

Not matched

fuzzy spam
addressmails
sales@schnucki.koeln.ccc.de 340
iamjustsendingthisleter@schnucki.koeln.ccc.de 19
info@schnucki.koeln.ccc.de 6
noorclnrnotytmnpqdrel-racccc@schnucki.koeln.ccc.de 4
xxxaaron_axyug_fdgxxxxx@schnucki.koeln.ccc.de 3
nmqnj0llmjmplrqp-hdavalon@schnucki.koeln.ccc.de 2
julian_frederico66767_445-rt@schnucki.koeln.ccc.de 1

Crawlers

crawler IPs showing the first 20 of 114
IPmailsDNS reverse lookup
217.153.147.505189unable to resolve
207.242.44.2074298unable to resolve
82.40.135.121347882-40-135-121.cable.ubr02.pert.blueyonder.co.uk.
66.36.79.2502425unable to resolve
216.130.32.52324unable to resolve
210.44.196.731961unable to resolve
221.223.253.2221840unable to resolve
80.5.90.1741822unable to resolve
66.36.73.1461356unable to resolve
66.36.77.2111271unable to resolve
67.49.19.481204cpe-67-49-19-48.socal.res.rr.com.
203.181.87.1201160p203-181-87-120.sub.ne.jp.
65.19.54.107 834qwest107-dsl10.cybermesa.com.
217.153.147.50 424unable to resolve
195.56.138.164 369budaors-37.dialin.datanet.hu.
216.136.138.173 323unable to resolve
66.47.249.148 246user-112vuck.biz.mindspring.com.
216.136.138.172 238unable to resolve
68.72.142.239 204adsl-68-72-142-239.dsl.chcgil.ameritech.net.
24.135.112.14 201unable to resolve
65.19.54.107 200qwest107-dsl10.cybermesa.com.
66.47.246.111 184user-112vtjf.biz.mindspring.com.
66.43.176.10 149uslec-66-43-176-10.cust.uslec.net.
61.213.92.202 119j092202.ppp.asahi-net.or.jp.
202.169.155.188 116pc-202-169-155-188.cable.kumin.ne.jp.
222.226.47.251 107KHP222226047251.ppp-bb.dion.ne.jp.
221.221.221.218 106unable to resolve
66.44.196.69 93ETCDSL-196-69.ellijay.com.
210.32.6.108 74unable to resolve
210.32.6.108 74unable to resolve

usage of a crawling result [by timestamp]

Compare this plot to the plot at the very top of this page. You will notice that very few crawling results in the last year were used to generate spam. The amount of spam however, exploded within the last year. It seems that certain old crawling results are used to generate most of the spam. Due to this fact there is a little chance to reduce the spam in the near future by just taking off a mail address.

usage of a crawling result [by crawler IP]

Interpretation: The database of few crawlers is used to generate most of the spam.