I’m back, for now, I think

In case anyone noticed that this blog has been down for a little over 3 months, it was because I logged in on my birthday and saw something that made me think the AWS EC2 instance that this runs on was hacked and the instance was busy mining bitcoins for someone else. I don’t think that they would get many bitcoins from an EC2 t2.micro, but I guess they were trying anything that they could get. I shut down the instance and left it for the past few months. Today, I finally took the time to look at it and decided to bring it back up and see what happens. Before I brought it back up I created a new instance, detached the disk from the old instance and attached it as a second drive to the new instance where I glanced briefly at the logs to see if I could see anything that looked bad. I didn’t, so I just made a snapshot of the drive, then re-attached it to the old instance and started it up. After logging back in and updating the SSL certificate I was able to bring this blog back up.

I’m not sure when I’ll take the time to write anything else, but I will keep a closer eye on this blog and if I see any more signs of hacking it will go away for another while.

All the best,
Ari

May the Circle be Unbroken

Two weeks ago my career in IT came full circle. Fifty-five years ago, as a rising high school senior, I took a summer course in computers at Columbia University in NYC. It was a very comprehensive course which covered theories of computing, algorithms, compilers, etc. in the mornings and then taught us programming in Fortran and assembler for Columbia’s newly updated IBM 7094 computer in the afternoons. For many years, that course was my only formal training in computers and IT. Most of what I have learned since then was either self-taught or learned on the job.

The 7094 had recently been updated from a 7090. The hottest question at the time was what was the difference between a 7090 and a 7094, and the answer was 4. In addition to a few other changes in clock speed, instruction set, etc. a major change between the two computers was that the 7094 had 4 more index registers. The 36 bit instructions for these computers had 3 Tag bits set aside to indicate the use of an index register, often used for stepping through memory in loops. In the 7090 these 3 bits selected one of 3 index registers. For the 7094, a binary decoder circuit was added so that the 3 bits could select one of 7 index registers, thus the update from 7090 to 7094.

Part of my decision to take a course in computers during that summer was the news that my high school, Brooklyn Technical High School, was planning to get a new IBM 1130 computer in the fall. In addition to taking the course at Columbia that summer I also went to the IBM offices on Maiden Lane in lower Manhattan and bought a set of manuals for the IBM 1130 so that I would be ready to use the computer when it arrived. In fact, I ended up being the first person to successfully run a program on Tech’s 1130. My oldest brother was going to school at Brooklyn College at the time, and was working on the IBM 1620 which they had there. I was able to visit his office and use a keypunch machine there to punch my program onto the punch cards then used as input for computers. As I recall, the program was to calculate great-circle bearings from NYC to other points on the globe. My father was a ham radio operator and he had installed a large, rotatable antenna on a tower alongside our home in Queens. He asked me to give him a table of bearings to point his antenna depending on where in the world he wanted to speak to someone, so I wrote the program, punched it onto cards at Brooklyn College (along with the Monitor Control Records, an early form of JCL for the 1130) and then walked into the computer room at Brooklyn Tech one day and asked the teacher there, I believe it was the head of the electronics course, if I could use the computer. He stared in amazement as I walked over to the card reader, inserted my card deck and then proceeded to run the program which generated many pages of output tables. He was sufficiently impressed by my ability to get the computer working that from that day on I was always welcome to come into the computer room, even if I was cutting a class to do it.

One interesting sideline about the 1130 was that it was a desk-sized computer console with a removable rotating magnetic disk cartridge (IBM 2315) behind a panel in the desk stand. The monitor program, as well as the compiler and other utilities were loaded onto the disk drive from 2 boxes of punched cards (about 4000 total cards). One item which I seem to have neglected to notice in the manuals I bought was that you had to turn off the disk drive with a switch behind the front panel before you shut off the computer. Otherwise, the retracting head of the magnetic drive would scribble all over the disk as the power was removed. We wondered why we had to load the 2 boxes of cards onto the magnetic drive every day, until someone realized that the switch inside the front panel would save us that chore.

There is irony in the fact that this IBM 1130 was bigger and more powerful than the IBM 1620 computer at the college I went to, although my college also managed to purchase an IBM 1130 during the summer before my senior year. More about that in another post.

The full circle I was talking about in the first paragraph of this post is that 2 weeks ago I started working as a system administrator in the computer research facility of the computer science department at Columbia. So my career in computing, which started with a course at Columbia 55 years ago, has now returned me to Columbia.

Sometimes You Just Need a Man-in-the-Middle (MITM)

Ok, to be politically correct I suppose it should be called a Person-in-the-Middle, but the acronym PITM is just too close to PITA for me. Maybe I’ll change it to Machine-in-the-Middle since that is what it usually is, but for me it was just a process on the receiving machine.

For those of you who don’t know it, MITM is often used in the context of an attack on web browsing. When a browser (like Chrome, Edge, Firefox, …) connects to a sever the name of the server is converted to an IP Address via DNS and then the packets are routed between the browser and the server. Of course, if someone can corrupt your DNS or stick a malicious router into the path between your browser and the server then they can read all of the packets that go back and forth. That is called a MITM attack.

The S in HTTPS is for Security and indicates that the connection is encrypted between the browser and the server using Transport Layer Security (TLS), which is an update to the previous Secure Socket Layer (SSL) which it turns out wasn’t really as secure as people hoped it would be. This is intended to protect against MITM attacks by making sure that the machine-in-the-middle can’t read the encrypted packets. When a browser connects to a server over HTTPS the server supplies its Certificate to the browser so that the browser can confirm that it is connecting to the correct server. The certificate contains the name of the server and is cryptographically signed by a Certificate Authority (CA) which confirms that the server name belongs to the server. This is called Public Key Infrastructure (PKI) which is well beyond the scope of this blog post. If the name of the server in the certificate doesn’t match the name of the server you asked the browser to connect to, if the certificate is signed by a CA that your browser doesn’t trust, or if there is another problem with the certificate like it is expired, then your browser will put up a warning about the certificate before letting you see the web site. Some corporate web proxy servers, which connect computers in a corporate environment, include a MITM which allows them to snoop on what their employees are doing on the Internet, even when the employee is using HTTPS. To avoid their employees getting the browser certificate warning the proxy server has to create on the fly a certificate for the specific server which the browser is connecting to. Since no legitimate CA will sign such a certificate the proxy server has its own CA and any browsers which use that proxy have to install the CA certificate from the proxy server as a Trusted CA.

This blog post is about why I needed a MITM in order to solve a problem I was having. We have 6 Amazon Web Services (AWS) Elastic Compute Cloud (EC2) Virtual Machines (VM) and I wanted to have all of them send their logs to one common VM for analysis. These VMs came with RSyslogD, a standard Linux/Unix system logging utility which I planned to use. Of course, the version of RSyslogD installed was 5.8.10, which was released in 2010 and last updated in 2012. For comparison, the latest version of RSyslogD is 8.2104. Since I am a card carrying Certified Information Systems Security Professional (CISSP) I decided that the logs should be transmitted to the common log server over TLS even though the servers were all in our AWS Virtual Private Cloud (VPC) meaning that no other computers should have access to our packets. Configuring RSyslogD to use TLS wasn’t too hard, but we also had some Python programs written in-house which I wanted to have transfer their logs directly to the common log server without using RSyslogD on the local VM. If you read my earlier blog post about Python you know how much I love looking for and using standard Python modules. I was able to find a Python module which said it interfaced the standard Python logging module to the Syslog protocol used by RSyslogD. Of course, it didn’t support sending the packets over TLS so I had to modify the module to wrap the packets in TLS. I was able to get the Python programs to send their logs successfully to the common log server using the Syslog protocol without TLS but when I wrapped the packets with TLS the logs were ignored. I could see that the TLS connections were being made, but since TLS encrypts the packets I couldn’t see what was in the encrypted packets that kept it from working. The packets going between the RSyslogD processes on the servers was working, but my TLS wrapped Python packets were being ignored.

To figure out what the issue was, I needed a MITM where I could decrypt the packets and inspect them to see what the difference was between the working logs from the RSyslogD process and the non-working logs from my Python module. Since I was already so embedded in Python for this I decided to write a MITM module for Python which would accept the encrypted TLS connection from the source, decrypt and display the logs, and then re-encrypt them and pass them on to the RSyslogD process on the common log server. Normally a MITM module is blocked by browsers because it supplies a certificate whose name doesn’t match the server you are connecting to, or is signed by an untrusted CA, but in this case I didn’t have that problem. For these TLS connections I had created our own local CA which only needed to be trusted by the VMs in our VPC. Since I had already been doing all this Python coding with TLS I had no problem cobbling together a simple MITM module with which I quickly discovered that the working messages had the length of the log message inserted before the actual log message. Adding that to my Python TLS wrapper module got the log messages flowing cleanly. Of course, if I had Googled “Syslog over TLS protocol”, as I should have, I would have quickly found RFC5425 which would have given me the needed answer without the necessity of an MITM, but what would have been the fun in that?

Simple Sample SAML Service Provider Programmed in Python

Try repeating the title of this blog posting three times fast.
And if you think that is tough, try coding one. I haven’t been paid to write computer programs since I left the research faculty of the Albert Einstein College of Medicine back in 1986, so my coding skills could be a little rusty, although I have written lots of simple scripts, modified dozens of others in multiple languages and help programmers debug their programs in more languages than I can count, but more about that in another post. This one is a rant about Python.

I was first introduced to Python as a programming language in 2001, when my boss at Bear Stearns at the time (who shall remain nameless) promised the London research team that he would speed up the delivery of their research Emails. At the time we were using Sendmail with some intervening shell scripts to manage our outbound Emails and the London research team was sending their research out to lists of several hundred addresses. Since Sendmail in those days was single-threaded the first addresses in the list got the research fairly quickly (by the standards of those days). Unfortunately the owners of the lists didn’t always keep them clean and up-to-date, so there were often bad addresses or addresses with misspelled domains. Since Sendmail would try every MX server for a domain, or the A record for the domain if there was no MX records, then every bad address or domain would slow down the delivery to subsequent recipient addresses. By the time Sendmail got to the end of the list the timely research was often obsolete so our London research team, and their customers, were understandably upset. My boss’ solution was to obtain 2 Unix servers, powerful Solaris boxes at the time, download Postfix, Python and Mailman onto those servers, and then hand them over to me and resign from the firm. It became my job to put all this together so that the London research team, and ultimately several other Bear Stearns teams, could use these servers to send out their research in a timely manner.

Mailman, for those of you who don’t know it, is open-source mailing list manager software written in Python. The then current version of Mailman did not include “Real name” support for members which I see is now a feature of the current version, but our users required it, since they couldn’t be bothered knowing the actual Email addresses of their clients. That version also didn’t include the concept of a list member manager separate from the list manager, although we wanted our research people to be able to maintain their mailing lists without having any access to the other features of their mailing lists. Thus, I had to write an entire new user interface for the Mailman mailing lists which allowed the list owners to import/add/delete real names and Email addresses for their clients but which hid from them the other features of Mailman and their lists. Fortunately Python is an object-oriented language and the Mailman lists were nested objects so it was not too difficult to add attributes to the list objects for real names, and to modify the user interface to restrict what our list owners could do. Of course, first I had to teach myself enough Python to understand the Mailman source code and figure out how to modify it. That took a couple of weeks. As I recall the Mailman code was written in Python 1.5, so things have changed a lot since then, but that was my introduction to Python.

Fast forward to 2019, where I’m helping an old friend from BS with some software she is writing in Python and she determines that she needs to be able to do Single Sign On (SSO) using SAML for one of her customers. This being the era of Linux, open-source software and shared library modules I searched for a Python module that could be used as a SAML Service Provider. I found a few, but none had adequate documentation to just plug them in and most were designed for specific web frameworks. My friend was writing her code in Python 2, using a web framework written by another old BS alumnus which mostly outputs JSON and was unable to supply the 302 status which the browser needed for simple HTTP redirects to the SAML IdP. Also, this being 2019 and the last year that Python 2 will be supported (although I see that there are still some utilities which may not be Python 3 ready), my code had to work with Python 2, but be upward compatible to Python 3. I managed to get a working proof of concept (POC) for the code using Apache and Python 2 CGI, but it is still clunky.

Moving the code from Python 2 to Python 3 has been more of a headache than anticipated, mostly because of the change in the way strings are handled. Distinguishing between byte strings and Unicode strings is very necessary, but it becomes a pain to manage when modifying lots of legacy code. But that’s not my major complaint about Python. Maybe my complaint is just because I haven’t taken the time to understand the issues involved, but it seems that the method Python uses for locating system modules has evolved over the last 20 years in not always compatible ways. The latest idea, of every application having its own environment with its own set of library modules may make sense in these days of really cheap memory and storage, but is difficult for us old-timers who are used to having limited memory to work with. Here again I will save this for another post.

I’ve had several utilities which are coded in Python and which self-update, but which have been unable to find their modules since the default Python was modified from Python 2 to Python 3, even if they have their own version of Python in their environments. I’m not sure how to tell these utilities how to find the commonly installed modules, or how to install needed modules into their specific environments. I’m sure I will figure this out in the next day or two, but it would have been great if Python was able to do it by itself without forcing me to go through these contortions to make things work.

Enough minor ranting for now, but I did make some promises above for more posts in the future. I’m hoping to put together more, shorter posts. We’ll see if I can do that.
Thanks for reading this.

DNS at Bear Stearns

In my last post I mentioned “DecNet Terminal Servers” which connected the dumb terminals to the big Vaxen in the Whippany datacenter. Those terminal servers actually used LAT, which was a Dec proprietary protocol to connect the dumb terminals to the Vaxen. Shortly after I got to Bear Stearns someone in the purchasing department discovered that they could purchase terminal servers which spoke LAT for less than the cost of the Dec terminal servers. These other terminal servers also spoke telnet over TCP/IP, so that the same dumb terminals could be used to connect to the Amdahls and Sun Microsystems servers. Unfortunately those terminal servers needed DNS to resolve the names to IP addresses and there weren’t any DNS servers at Bear Stearns. At the time, the Unix boxes were using NIS, which was then known as Sun’s Yellow Pages (YP).

I was able to get a TCP/IP stack for VMS from a company called TGV (Two Guys and a Vax, not to be confused with the French high speed railroad). In addition to the TCP/IP stack TGV also supplied versions of standard Unix utilities like Bind (the standard DNS server) and Sendmail (the standard SMTP server). Sendmail allowed me to connect the VMS Email system to Sun Unix Email, cc:Mail on the PCs and Profs on the mainframes. Bind let me run a DNS server on one of our spare VaxStations. The issue was creating the zone files (the lists which relate the host names to their IP addresses). There were hundreds of hosts and maintaining the Bind zone files by hand was not feasible. Back in the days before the Internet, people shared information and programs via Usenet over UUCP. As I mentioned in my first post, Email in those days was delivered overnight. I won’t go into details about UUCP and Usenet in this post, but Wikipedia has articles on many of these topics. Usenet newsgroups were also transmitted via UUCP overnight and they allowed people to share information about programs and other issues. When I went looking for a program to convert Unix host files into Bind zone files I discovered that there was a program named h2n which consisted of several thousand lines of AWK code but was very flexible. To get the program I had to send an Email to several FTPmail servers asking for a list of available programs to see which server had h2n. Then I had to send an Email to the proper FTPmail server asking it to send me the program. In those days, programs came as several Email messages which had to be concatenated together to make a shell script which created the file (or files) needed to run the program. All in all it took several days to get the program. Once I had the program I set up a scheduled job to run ypcat to save a copy of the NIS (YP) hosts map to a file, then I ran h2n against that hosts file to create the needed forward (name to IP address) and reverse (IP address to name) zone files. Those files were then copied to a VMS VaxStation (fixt31 sitting under my desk) which was running Bind and that was used by the new terminal servers to map the names to IP addresses. That was the first DNS server at Bear Stearns, and remained the master DNS server for many years.

Enough for today, I hope to get another post out in the next week or two. That one will be a break from my Bear Stearns past. I hope to write about a recent project I was working on.

More early BS (Bear Stearns) memories…

But first, Holly suggested I left out two pieces of information from the last posting so I’ll give you those. The first is how to say the name of this blog. In ‘techspeak’ the exclamation mark is called a ‘bang’, so my early Email address at Bear Stearns, as well as this blog, is sol bang ursa bang ari. The second piece of information that Holly said I should include is for those of you who have forgotten your Latin or your constellations. Ursa is Latin for bear, so the computer which was the UUCP gateway to Bear Stearns was called bear.

Speaking of Bear Stearns, back in 1988 when the head-hunter told me he was sending me for an interview at Bear Stearns I thought it was a streaking club. If you don’t get that, ask in the comments and I or someone else will explain it.

I was hired by the IT department at Bear Stearns to support the Fixed Income trading floor, specifically Pete the new head of mortgage research. My initial assignment was to install a network of VaxStations running VMS for the traders, research analysts and developers. There were already a few VaxStations around the 4th (FI trading) and 5th (FI research) floors at 245 Park Ave., but most of the people had dumb DEC (VT320 or color VT340) terminals connected via DecNet Terminal Servers to the big Vaxen in the Whippany datacenter, along with several smaller green Quotron screens. On my first day on the job I found about 15 VaxStation 3200’s still on their pallets on the 5th floor, waiting for me to unpack them and install them around the group. The VaxStation 3200’s were desk-side units, about the size of a radiator in a typical NYC pre-war apartment, so they were difficult to install on the crowded 4th floor trading desks, but we managed to fit a few in, along with their 19 inch color CRT monitors which weighed about 70 pounds. DEC soon came out with their VaxStation 3100 model, which was a desktop model more the size of a pizza box, so we ordered lots of those. Pete didn’t want his people to be able to take their software or data home with them on the 3-1/2 inch floppy disks which came standard with the VaxStation 3100’s, but DEC didn’t sell a preconfigured 3100 without the floppy drive, so he ordered diskless computers and then ordered SCSI controllers and 2 disks for each computer. That meant that I had to install the hardware and the software on each of the stations before it could be used. The 3100’s all came with thin-wire Ethernet, basically coaxial cable BNC connectors which allowed them to be easily networked together. They also each came with a phillips-head screwdriver for assembly. I was able to install the controller cards and disks into the servers, then daisy-chain them together with the coaxial cable and connect them to one VaxStation which had the software installed. I wrote some scripts to automatically install the software, setting the DecNet address for each computer based on the MAC address of the Ethernet interface so that they each got unique address and since the boxes had the MAC addresses I was able to give each one a unique name tied to the address. All of this is standard now, but in those days it was a fairly new idea. I set up the configuration table of MAC addresses and names, then started the script and left for the night. The next morning I would come in to a half-dozen new workstations ready to be installed on the desks. I used a simple naming convention based on the use of the computer and a sequence number, so the ones I installed were called boring names like fixd02 for a developer workstation or fixt31 for a trader workstation. The earlier VaxStations had more interesting names. The ones running VMS were named for denizens of the sea (fluke, mako, squid, etc.) while the ones running Ultrix (DEC’s proprietary Unix variant) were named for birds who ate denizens of the sea (osprey, heron, etc.). Interestingly enough, the Ultrix group eventually gave up and those boxes were reimaged with VMS, so the sea denizens ended up eating their predators.

Shortly before I got to 245 Park Ave. Bear Stearns’ networking group had started to run fiber to the trading desks so each workstation that I installed had to have a fiber to Ethernet transceiver attached. In those days, the standard Ethernet connector, when not using the thin-wire BNC connectors, was a 15 pin slide-lock connector, but Bear Stearns decided to replace all of the slide-locks with screw-down connectors because of a mishap with a Sun Workstation on the equities trading desk. I only heard about that third-hand so I won’t go into it here.

There’s lots more from these early days, some of which I still remember, but I’ll save more for another post. I hope this didn’t bore too many of you too much.

Introduction to my blog: sol!ursa!ari

Welcome to my blog. I’ll start with an explanation of the name of the blog. That was my Email address when I first joined Bear Stearns back in the late 1980’s. Yes, that was before most people knew what Email was, and before most people consider the Internet to exist. Back in those days Email was delivered from computer to computer via UUCP (Unix to Unix CoPy) over the POTS (Plain Old Telephone Service) or standard dial-up phone lines, mostly at night when the long-distance costs were less. (Yes, in those old days telephone calls were charged for by time of day, length of call and distance from caller to receiver). Each computer only contacted a small number of other computers, so you needed a path from your computer to the recipient’s computer when sending Email. To simplify the process there was a UUCP mapping project which collected the list of computers and which other computers they exchanged Email with. Using the path mapping information and programs like pathalias you could find the path from your computer to any other computer which was listed and then create the path. In the case of sol!ursa!ari, sol was a well-known computer at Columbia University which our computer at Bear Stearns (ursa) called nightly to exchange Emails. Thus, I only needed to start my Email path with sol for most others to be able to find me. I was the user ari on the computer ursa which exchanged Email with the well-known computer sol.

More about this, and lots of other things, in future posts.