Too Much to Ask
Make Mother's DayLooking at the proliferation of personal web pages on the net, - M G Siriam Source: Funny Times, May 2000 (Both my sons sent me a copy of this.) Mom's not the only one that wants an interesting website: Source: Funny Times October 2000 How Do I Know Who Visits My Website?When computers are connected to a network, they are identified in a variety of ways, but two are particularly useful: IP addresses, and domain names. When dealing with the Internet, every machine has an IP address (also called a network or logical address). Typically, every computer will have one IP address, and every IP address will map to one computer. Further, IP addresses are normally globally unique - the exception is when using a proxy or NAT (Network Address Translation) box, which will be discussed shortly. An example of an IP address is: 64.29.16.113 When you request a web page from a server, a request is constructed that contains the destination IP address (of that webpage’s server), and the source IP address (of your computer). On receiving your request, that web server will log the request and your source IP address, process the request, and send the reply (the requested page) back to your source IP address. This means a user cannot "spoof" or lie about his IP address and still be able to browse the web, as all replies would go astray. He can, however, spoof the server when delivering a virus, or when trying to waste the server’s resources (called a DOS or Denial Of Service attack), as he doesn’t want a valid reply back in that case. The IP addresses contained in server logs can otherwise be assumed accurate except when using proxies or NAT boxes (explained in detail below). IP addresses aren’t directly useful for humans, who generally prefer to give sites names (actually, the name applies to their server, group of servers, or their leased portion of a server). This identification is called a domain name. There’s no requirement that a server have a domain name but in practice virtually every server will have at least one so that the server (and the services it offers, such as hosted web pages) can be more easily located by users. Actually, most “normal” computers on the internet will have a domain name as well. (By normal, I mean personal computers used by “real” people to browse the web – rather than spiders or crawlers merely cataloging sites.) These computers usually have a unique domain name - albeit it is often only a temporarily assigned one – however, a permanent record is kept of these temporary assignments (more on this later). A sample domain name might be: This name can be parsed section by section: the "www" identifies this server as a web server; "cosc" identifies this address as being assigned to a subsection (a Computer Science Department); "canterbury" identifies the name of the organisation (it is called Canterbury); "ac" identifies the organisation’s type (it is an academic institution, therefore the “canterbury” most likely refers to a university); "nz" identifies the country (New Zealand – therefore, this is the University of Canterbury, located in Christchurch). These names are useful for the people reading the address. A computer, on the other hand, merely resolves the domain name to a unique IP address and uses that (a domain name per se is meaningless to a computer). Some years ago the name lookup server handling all Microsoft’s many servers went down; although every other server was still up, functioning, and processing requests, almost no one could access anything as users knew only domain names. Computers were unable to translate domain names into IP addresses. Nowadays, Microsoft runs multiple name lookup servers (called DNS servers) to ensure this problem never recurs. Another example: mail.earthlink.net Here, "mail" identifies a particular server (the mail server); "earthlink" identifies the organisation (a company called Earthlink); "net" identifies the type of organisation (that Earthlink is, in fact, an ISP). Note that the country identifier is missing. In theory, this means the oganisation isn't country-specific, but in practice it means it is likely based in the US - the US has a country code (called a TLD, or Two Letter Designation), "us", but few use it (most opting for “.com”). Again, this domain name is for the benefit of users, as a computer doesn’t know what the words “mail” or “net” mean. Every domain name must map to an IP address, but not all IP addresses have corresponding domain names. This mapping function is done by something called DNS (Domain Name Service) servers, which process domain name lookups and reverse domain name lookups (the first maps a domain to an IP, the second maps an IP to a domain). It is rare for a user to need a domain name lookup, but typing “nslookup domain” or “nslookup IP” into a command line window will perform one on most operating systems (including Windows XP). You can also query a whois server, although there exists no easy way for most users to do this (a good stats – as in website statistics - program will do it automatically, however). You generally direct your request to view a webpage to a domain name – but your computer does not. When you ask to view a webpage named index.html on the site www.yahoo.com, your browser first requests a DNS server to return that domain’s IP address and then sends the request there. Similarly, even if your computer has its own domain name, when it sends any web request, it attaches its IP address, not its domain name. A website’s stats program may, if set up to do so, perform a reverse DNS lookup on IP addresses to map them to domain names. It then records these domain names in its own log so that the webmaster can see who is accessing the site. This information is often useful for sales analysis. For most webpage requests, mapping the request back to a domain name isn’t usually too useful as most connections to the internet resolve to a user’s ISP, not to a user himself. The ISP generally runs a large modem bank with hundreds or even thousands of modems. Although who is logged in at any given time constantly changes, every current connection is assigned an IP address from a pool. (ISPs usually have far more customers than they do modems or IP addresses and simply hope all customers do not attempt to log on at once - usually a safe assumption). For the duration of the connection, the assigned IP address will be unique and unchanging, but a customer will generally get a different address each time he or she begins a new session. A reverse DNS lookup on one of these IP addresses will resolve to a name, but it is generally only meaningful to the ISP. An example: 41.piscataway-15rh15rt.nj.dial-access.att.net This probably means the user was on modem number 41 of a dial-up access server located in Piscataway, New Jersey, run by AT&T. The "15rh15rt" might identify the server, the computer, the IP address, or the user - if you were AT&T. For us, it is essentially useless. Further, the whole name (as well as the IP address it resolves to) might change at any time. If the user thus identified (such as it is) is later discovered to have done something illegal or obnoxious, you might be able to convince the ISP to find out who the user was and to take appropriate action, such as to notify law enforcement authorities, or terminate the user’s contract. Most ISPs resist such action as it consumes time and resources and gives other customers the idea that their privacy could be invaded on mere suspicion (the exception is if the user was sending spam, which almost invariably results in immediate termination of the user’s contract - often on nothing more than suspicion). To reiterate, the above example represents the most common type of domain name seen in a log file, and it isn't overly useful. At best, you can identify a user’s country and sometimes his general geographic region. (Years ago, some ISPs embedded the dialup user’s first initial and last name into the temporarily assigned domain name, but most users viewed that as an invasion of privacy, so few if any ISPs do that now.) Domain names that resolve to a user or to his organisation are much more useful. An example of this type of domain name is: mail.pwg.co.nz This domain name identifies a server (probably a mail server, based on the name) of a New Zealand company identified as “pwg”. They may have other servers as well, although in this case the name www.pwg.co.nz doesn’t resolve to anything, which means they probably don’t have a public web server. (Their mail server happens to be running a web server, but it is private and requires a username and password to access.) The mail server mail.pwg.co.nz may allow this company’s employees to check their mail from home, and it probably has a webmail interface (allowing them to send mail) as well. An organisation’s server may be run "in house" by the firm’s IT department, or if it’s a smaller firm, it may pay an ISP to run the server. Either way, the entire organisation (or domain) is named (which is, of course, where the phrase "domain name" originated). As a practical matter, this means all requests originating from this organisation will have source IP addresses which resolve to the organisation’s domain name. When an employee browses the web from his office, the source IP of the requests will be that of the organisation. When the web server does a reverse DNS request, all the employees’ individual IP addresses will resolve to the company’s domain name. For our example, what would we get if we issued a “whois” request? Domain Summary: This domain is currently listed in the Shared Registry
Name Servers The web is short on IP addresses. There are millions of them, to be sure, but there are millions of users too. Therefore, it was long ago decided that certain IP address blocks could be used by multiple companies, as long as traffic bearing these "internal" IP addresses stayed internal. When a computer which has been assigned an internal IP address wants to communicate with the outside web, something called a NAT box intercepts the packet, changes the internal source IP to its own external IP, and then sends the request out on the Internet. When a reply comes back, assuming that the request has not timed out, the NAT box is able to recognise which internal machine the reply is destined for as it has an unfulfilled request which matches; it then replaces the external IP address with that machine’s internal IP address. The internal machine perceives itself as communicating directly with the web server; however the web server perceives itself to be communicating only with the NAT box. This generally works quite well – it is common for dozens or even hundreds of machines to communicate with the Internet via a single NAT box - which in turn has a single "real" IP address. Since this has the benefit of being a fairly secure setup, it is often used even when the company is small enough (or has enough clout) to get a unique IP address for each computer and server in its network. (Proxy servers, incidentally, are an older, less reliable, and much less transparent technology. Nowadays, they have been almost completely replaced by NAT boxes.) Use of a NAT box means that every user at the company will appear to be connecting from the same computer - typically the company’s gateway or main router. Since this is also where the web server will appear to be located, it allows you to identify the employer of most people who browse the Internet while they’re at work. [Once I saw where a browser (mistakenly) arrived at my website after searching for “nude preteen girls”. A “whois” lookup allowed me to find the source was a small company in a small town in New Zealand. The contact was a woman, so I sent her an email with a copy of the request and told her she may or may not be interested to know that someone was using his computer at work to browse for porn sites. She replied that they were a very small company of eight women and one man and, since the there was no way the man would be guilty of looking at porn, it must’ve been a hacker. Of course it wasn’t, but I let the matter drop as I feel sure she mentioned it to him and sure he didn’t use his work computer to pursue his desires after that. I don't care if someone is interested in porn, but children's porn is another matter.] A few questions may still be lingering:
Q.: If you access a site direct from dial-up modem rather than through a website - how much identifying data would be stored by the site you visited (presume a no cookies setting on the web browser) - or have I misinterpreted a point somewhere? Also you mentioned being able to indentify a party who has enterred you through another (referring) site (for example, Google) - how much identifying would be disclosed? A.: If you are on a dial-up modem, a site you visit will not be likely to get useful information. It will learn your connection's domain name - generally your ISP - but that usually contains no person information. (I know of no NZ ISP that still embeds your name - netlink used to, but it's morphed into TelstraClear now, and they don't.) Regardless of your connection type, if you click a link on one server which sends you to another server, the location of the link you clicked on is included in your request. I know where my address appeared that caused someone to arrive at my site. However, if the link is embedded in email, the referrer field is blank. If you did a google search for flatrock, say, and clicked on one of the links that google returned, then I would know that you had searched for me on google and what key words you had used - but no more about who you were than your connection's domain name as mentioned above. It you cut and pasted my address from the search rather than clicking on it, then I would not know where you came from nor the keys you searched on. I would still know your connection's name. A cookie merely helps you identify repeat visitors - not the visitors themselves. I would store a cookie on someone's computer with a code I could recognise. Then, when they return, I would know they had been to my site previously. But I would not know who they were through the cookie - I might know that in other ways. Large advertising networks such as Double Click will have banner ads on several sites. When you visit one, they set a cookie. When you visit another site with one of their banner ads, they can track your browsing habits (whether you click on their ads or not), but you are still indenified as a code, not an identity. (That's the theory, anyway.) I use two browsers - Netscape without cookies and IE for those times when the site absolutely requires it - like my bank. Troubletownby Lloyd Dangle
Source Funny Times November 1999 How quickly things change... For IT-related articles on snooping, usage, the future, e-diaries, piracy, flickers, cyborgs, browsing, trends, jokes, philosophic agents, artificial consciousness and more, press
the "Up" button below to take you to the Table of Contents for this Information and Technology section. |