Relational database theory part 2—mathematical foundations—the relation and the set

In which the author discloses a pet peeve and lays out the minimum necessary understanding of set theory for query writing and relational database design.

While the concepts had antecedents in the form of thoughts and papers written by mathematical thinkers, the generally accepted origin of relational database theory is a paper written by a mathematician or engineer at IBM, E. F. Codd, and published in the Communications of the Association of Computing Machinery in June of 1970. Entitled “A Relational Model of Data for Large Shared Data Banks”. codd.pdf

The paper is written in mathematical language and not very accessible to a non-mathematician. Fortunately the concepts are laid out in simpler fashion in subsequent publications such as the one by Chris Date that I previously mentioned, and I imagine in hundreds of computer science textbooks by now.

In the olden days, I attended a relational database session in which LargeFinCo brought in an expert to give the development staff one day’s worth of relational database theory. He started out by explaining that it was called ‘relational database theory’ because the tables had relationships between them. I prepared to object, but another programmer beat me to it, and explained that is not the reason its called relational, and in fact the relation is a mathematical analogue to a table. Its a minor point, really a point of style, but perhaps something to whip out at some point during a heated debate over a relational database design, or to embarrass an instructor with only a superficial understanding of his course material—which we ought not do as that would be mean.

Codd was a mathematician, and a “relation” is a mathematical object with many attributes and applications. For the purposes of our discussion, a relation can be thought of as a set of name-value pairs which can form the basis of something that can be represented as a table. A precise mathematical description of a database table can be expressed as “a set of n-tuples”, but we will get to that later.

The foundational building block of relational database theory is the mathematical concept of the set. While I proudly display the book Naive Set Theory by Halmos in my library, one need not go to such extremes. I do commend you to some fuller explanations of set theory available on the interweb and found by your favorite search engine. There are a few things you must memorize and understand at an intuitive level to work with relational databases—I believe that a full enough explanation would run less than 10 pages. Most of these things people learn by working with this stuff, but I believe a more formal grounding brings out some nuance that one might not pick up on the job.

Here are the concepts that I believe are fundamental to relational database theory, and extremely general in application to many aspects of computer technology:

Definition of a set: Mathematically a set may be any arbitrary collection of designated objects. For instance the set of integers greater than 3, or the letters in the english alphabet. When I talk about sets here, I will always be referring to sets of identical mathematical objects. This will be explained as we go through it.

Union of sets: The set of elements contained in two sets with duplicates eliminated. For instance the union of {a,b,c,d} and {d,e,f} will be {a,b,c,d,e,f}. The letter d is in both sets but only included once in the resulting set. Note that set operations are “closed on sets’, that is sets are input to the operation and the output is a set.

Intersection of sets: The set of elements both sets have in common. For instance the union of {a,b,c,d} and {d,e,f} will be {d}.

Subtraction of sets. The removal from one set the elements of the set that exist in another set. For instance subtraction of {d,e,f} from {a,b,c,d} will be {a,b,c}. This can also be expressed as identifying the intersection of two sets and removing the intersection from the target set.

Multiplication of sets: “Cartesian product”. Create a set containing all possible combination of the elements in two sets. For example, the multiplication of {1,2,3} and {a,b} results in {1a, 1b, 2a, 2b,3a,3b}. Multiplication of large sets yields very large results.

Partition of a set: Identifying subsets of a set such that a union of the subsets yields the entire set, and an intersection of the subsets yields an empty set. One common partitioning scheme is the partitioning of the continental United States into individual states and the District of Columbia which is a federal animal and not a state, strictly speaking.

These concepts and set operations are ubiquitous in SQL query writing and relational database design.

Relational database theory part 1—How I got here

In which the author reflects on some of the better aspects of a generally misspent youth.

It must have been 1995 and I was working as a developer in LargeFinCo* primarily as a mainframe COBOL coder. I had gotten pretty good at using the IBM IMS database, which in computer science terms is a navigational database—one basically navigates in an inverted tree structure. The database administrators were promoting the use of DB2 which is IBM’s relational database product.

I was conservative, technologically, and skeptical of all new things from IBM due their track record of releasing buggy or short lived products. I resisted the use of DB2. A friend expressed his excitement about DB2 and relational databases in general, and at some point I thought “Maybe there are some new database concepts I ought to learn about”. I purchased a copy of Chris Date’s book “An introduction to Database Systems”, and this small act, as can sometimes happen in life, changed the trajectory of my career.

In a high school math class, perhaps in 1976, I recall set operations being discussed while a Venn diagram was displayed on the chalkboard. Little did I know that day or two on set theory was a glimpse at my destiny. So, back to LargeFinCo in 1995—Date’s book had the two qualities I like to see in a technology book—it was authoritative and comprehensive. It turned out I had a knack for writing SQL, the relational database query language, and quickly became a go-to guy at work for people with difficult SQL questions—and that remains true today. At some point, perhaps a few years later, I was exposed to another relational product, Sybase SQL Server. This provided another glimpse into my future as that product evolved into Microsoft SQL Server.

LargeFinCo, in the 1990s, had another process which proved indispensable to me—Database Design meetings. As a coder on a project which would use DB2 or Sybase, or even perhaps IMS back then, developers, system users, and database administrators had a series of meetings in a room with lots of whiteboards. The DBAs would take the system users through a series of questions about their data, and prepare relational entity relationship diagrams which formed the database design they would implement. I note that at the time, they used crow’s foot notation to indicate cardinality among tables, and perhaps bonding to it as the first approach I saw, as any baby animal bonds to the first thing it sees upon birth, I still use crows foot notation today, some 30 years later.

As with many computer design discussions, there were disputes among the participants about various points of the design, and at times these devolved into raised voices and anger—myself not least among them… It turned out that in an environment where the smallest nuances of design were endlessly debated, I received a comprehensive understanding of the design concepts—brand new to me then—which have formed the foundation of all of my design work since then. Reflecting on the decades, I have seen that the fundamentals of relational database design have not changed much from Chris Date’s articulation of them, and the process of meeting with system users to document how their data can be properly contained within relational database tables, have changed very little since then. The moral of this story is that once you learn the fundamentals, you are good to go.

I have many system implementations under my belt, and one thing I noticed pretty early on with relational databases was that once properly designed and implemented, the systems using these databases tended to be stable and the data accurate. When improperly designed or implemented, I saw systems which were unstable and required constant attention from programmers, and had multiple data quality problems. One indicator of an improperly designed relational database is that developers and report writers use ‘select distinct’ frequently in their queries, even in the multiple levels of a complex query.

Relational database theory is beautiful to those who find certain mathematical concepts beautiful, and it is fundamentally a mathematical creature. The theory is determinate and not complicated, once the fundamentals are mastered. Once I got past the initial ‘boot camp’ learning process, I have found it enjoyable and satisfying to build a solid and reliable system.

I note over the years there have been data storage concepts that tout the advantage of not having the strict structure of proper relational design. One appears to have been named in defiance of relational database theory called ‘No SQL’. There seems to be some sort of emotional appeal to freedom from rules and minimal learning curve which makes things ‘easy’ to use. Some years ago I noted Chris Date arguing endlessly in internet forums (or pre-internet blogs) against those who advocated for inaccurate relational designs, or against the entire concept as obsolete. Yet, as concepts and products became fashionable and fell out of fashion, a properly designed relational database continues to provide a powerful and accurate storage mechanism in the right circumstances.

Certainly a relational database is not the appropriate tool for all storage processes, and at times I wonder to what degree even an entire organization lacks the skills to build a good relational database. I have seen more than one vendor product with hundreds, or even thousands, of tables with zero primary keys declared and zero enforced referential integrity.

As I reflect on a 45 year career as a programmer, and a man nearing retirement—or at least retirement age, I am proud of much of what I have learned and done. Relational database theory and SQL coding have in part paid for my house, cars, and the raising of my children. All of this based on a passing thought I had one day: “Maybe there are some new database concepts I ought to learn about”.

* In my career, I spent many years working at large financial companies which I will refer to as LargeFinCo. They had large mainframe systems with DB2, and some had Sybase SQL Server. The specific identity of these companies is, ultimately, of no importance.

Extremely General Concepts: The Namespace

In math and computer science, general concepts are those that can be used in many situations. Being familiar with these concepts in a more formal manner helps us recognize them in the wild and incorporate them into designs or coding.

We are all familiar with namespaces, but maybe not under that name, and not in a systematic manner. For instance, telephone numbers comprise a namespace. Lets think about some attributes of this namespace. We know that a telephone number must uniquely identify a single phone. We know there is some degree of hierarchy to telephone numbers. At the highest level we have a country code, and below that the area code. Decades ago, the area code designated a specific geographic area, when phones were wired into fixed locations, but with the advent of cell phones, the area code no longer has geographic significance, and simply adds three digits to the original 7 digit phone number.

This one simple example provides an essential requirement for many namespaces—the ability to uniquely identify something.

Mailing addresses comprise another familiar namespace. Given the way that mailing addresses evolved over the past 300 years, the mailing address is tightly coupled to geographic location. The mailing address has more of a hierarchical structure. For mail within America, we have the 50 states and District of Columbia (which is some sort of federal island and not considered a state).

Within states, we have cities, and within cities, we have streets and ultimately the house or building number. The mailing address namespace introduces two new concepts—the partitioning of a space, and the hierarchy. The partitioning and hierarchical structure allow us to conveniently designate a single address among hundreds of millions. It would be possible to assign something like a 10 character code to a house or building. This would be very convenient for a computer, but not for people.

More formally, a partition is a set operation whereby a set of some sort is divided into subsets. Partitioning the continental United States into states is a familiar example. In this case, the set is the land area The formal definition of a partitioned space has two interesting set based properties—the union of all partitions yields the original set. The intersection of the partitions yields a null set, or nothing. In other words, when a space is partitioned, there are no overlaps among partitions.

As a state is partitioned to city names, we see that the partitioning of a space can yield a hierarchy. The hierarchical namespace is ubiquitous, and for those of us who have used computers a bit, the nested folder structure of files on a disk drive comes to mind.

When visualizing a hierarchical namespace, the inverted tree structure is often used. At the top is the root of the namespace and the lower nodes represent partitions of the higher nodes. So, an inverted tree diagram of mailing addresses in the US would have the US as the root node, and below that, a node for each state, with city nodes partitioning the state node.

So, when thinking about namespaces in a slightly more formal manner, we have some attributes of a namespace we can think about. For example:

1) Does a namespace need to uniquely identify something?
2) Is there a hierarchical/partitioned structure to the namespace?
3) What might be the root of a namespace?
4) How many levels deep is a namespace?
5) How many unique elements can a namespace identify?

In sum, partitioning sets of things into namespaces is ubiquitous both in computer work and daily life. I hope this discussion helps you think about about namespaces while navigating the real world as well as computer activities.

Onkyo 9090 Re-Cap

Produced from 1985-1997, the Onkyo 9090 FM tuner is the best FM tuner Onkyo ever produced (with the exception of a rare model that is very similar). The 1990s being the golden age of FM tuners. After that the mass market moved toward home theater and digital sound sources, leaving the 1990s with tuners that remain outstanding even today. It has excellent sound quality and is able to pull in weak signals better than almost any tuner of that era or today, at least for under $800. Magnum Dynalab has an excellent line of very expensive and performant FM tuners today–but I think their cheapest model is around $800.

I have purchased three of these from Ebay, and one thing they all have in common is that the large power supply capacitor next to a voltage regulator have blown out over the years. Two of the units had electrolyte from the capacitors on the circuit board, and the one I will talk about today had been sloppily repaired with an axial capacitor replacing the large cap which presumably blew sometime in the previous decades. The problem with the 9090 is a design flaw from Onkyo, where a physically large capacitor is too close to the heat sink of a voltage regulator, which roasts it.

When I get a 9090 used, they normally have spilled and baked on electrolyte from this capacitor. I am able to remove the crust by scraping it with a piece of wood and clean it with alchohol. Even in this degraded state, the tuner still works, but there is no need for me to leave it in that degraded state.

In this unit, I replaced all of the electrolytics, and ensured that the capacitors closest to the heat sink were rated for 125 degrees centigrade. I also thought about ways to increase the surface area of the heat sink, and ultimately decided to use a thermally conductive epoxy to glue two heat pipes onto it. The photo below shows the positioning of the voltage regulator heat sink and the capacitor, which runs at about 63 degrees centigrade according to my inexpensive infrared thermometer. I also note that the chassis of the 9090 has almost nothing in terms of ventilation and airflow past these hot parts.

This is what it looks like after I added the heat pipes. Again, not very scientific, but the heat sink appears to be running at 53 degrees centigrade with the heat pipes, a 10 degree improvement. 50/60 degrees does not seem very hot compared to caps rated for 85-125 degrees, but these measurements are before the top is put on the tuner and airflow will be reduced. Also, a 125 degree cap is rated for that temperature for something like 2000 to 400 hours. For electrolytics, heat is their mortal enemy, but I think the higher rated caps have a chemical paste inside that is formulated to tolerate higher temperatures longer than a typical 85 degree cap would. Presumably cap technology has also improved in the last 30 some years. When I do a re-cap, I typically select the most expensive cap that meets the electrical requirements, so I hope that improves the situation overall.

The heatpipes glued on. The dark spots are reflections in the light, not discoloration of the metal. I hope that the heatpipes are more effective than simply pieces of copper, but no way of knowing.

The view from the top–showing how close that cap is to the heat sink.

Another interesting aspect about re-capping this was a 22,000 uf capacitor that retains the presets when the unit is not plugged in. The original cap was the diameter of a dime and maybe 3/4 inch tall. I could not find a 220,000 uf cap of such small dimensions, so I got the size that was available and physically mounted it on some spare chassis space, since it is too tall to go in the PCB where the original one was. I missed this cap the first time I ordered parts because it was specified as something like .022 f. I thought it was .022 uf and would not be an electrolytic. After taking a closer look, I realized I missed the fact it was specified in farads–I have never seen that before in a schematic or parts list.

That’s about all I can say about fixing up my 9090. Its a monster in terms of physical size, it has great RF performance, and sounds great as well. Its clarity shows the overly compressed and greatly degraded sound quality in most pop and rock stations. Only our local jazz station, 88.5 and classical music station 99.5 demonstrate how wonderful FM can sound.

I “retailed” my Bryston 2B LB PRO

I love the Bryston 2B, its powerful and smooth and is only one rack unit tall. I have a pro model that I got, presumably after it was scrapped by the Canadian Broadcasting System. Its got some wear on the outside, but the electronics are in pristin condition. I replaced the electrolytic caps a couple of years ago, and when I recently re-capped a retail model 2B that I got on ebay, I replaced the stock 4700 uf power supply caps with 6800 uf ones, which due to newer technology is the same size as the original 4700 uf caps were.

The pro model is designed for use in a professional facility and does not have an on/off switch. It also comes with balanced XLR jacks in the back which are standard in the industry. Well, its kind of a hassle to hook these up to my un-balanced home stereo stuff–I was using a high fidelity balanced to unbalanced transformer.

When reading an ad for a 2B on ebay, the seller noted that he had replaced the balanced inputs on a pro with the consumer standard RCA jack. He noted that the input to the amp proper is single ended, not balanced, and the interface to the XLR jack is done with a small adapter circuit that contains an op amp, and horror of horrors to an audiophile, passes the signal through an electrolytic capacitor on its way to the amp. I studied the circuit diagrams of the pro and retail models and saw that the retail simply took the input from the RCA jack and fed it straight to the input pin on the amp module circuit board. I resolved to implement it that way.

To physically mount the RCA jack onto the XLR hole in the chassis, I found an adapter for that very purpose on ebay. They were expensive to my thinking, about $20 each, but when I received them I noted they were made from aluminum and were mechanically and electrically very sound, so to speak…

Working with the tiny shielded cable was tedious but I was able to create good solder connectios on the jacks. Below are some after pictures. The amp works great with the RCA jacks and this is a welcome development for me. Once I was done the top went back on and it was placed back into service.

Guitar amp project: Phase splitter and input triodes checkout

Finished wiring up the phase splitter. For the circuit, I used the ubiquitous long tail pair used in a Marshall Plexi 50 watt. Due to my use of larger coupling caps, it was very fussy to wire up, and I worried about some error causing all sorts of problems or emitting the magic smoke. Imagine my pleasure when I hooked it up to the function generator and captured the voltage it was placing on the grids of the EL34s.

We see the 3.12 volt sine wave coming in and the resulting 64.8/67.2 volts on the EL34 grids, a voltage gain of 21 or 26.5 db. I note the two outputs are very close in voltage.

Phase Splitter with the power tubes

The top is the grid voltages of the two EL34 tubes, and the bottom is the cathode current thereof, as measured on the one ohm cathode resistors.
Same thing, driven by a square wave.

Its days like this that a four channel scope with screen capture is a wonderful thing. We see the power stage operating properly when driven by the phase splitter. The EL34s are not well matched, as evidenced by their different idle currents, but this is good enough for a general smoke test…

Input Stage Checkout

Input triodes driven by a square wave.

Doing a smoke test on the input triodes. The input signal is a 3 volt square wave–I wanted to see what kind of voltages it could emit and these are delightfully high. The configuration here is the typical Marshall input with a regular and ‘high’ stage. The difference being in the bias resistor and capacitor. The different responses to the square wave are to be expected. Here we are seeing a voltage gain of 18.7 on one and 23 on the other, or 16 and 17.6 db. Since the phase splitter is emitting close to full power driving voltage with a 3 volt input, I will need to look at how I want to structure the gain staging in the amp. I will not have a traditional tone control circuit, so the means of hooking it up to the phase splitter input will require some experimentation. There will be a rheostat for each side of the triode pair, and the relative levels there act to some extent as a tone control. A single input will drive the triodes, acting as the traditional jumper between the regular and high inputs.

Bench test of the phase splitter and output section. The voltmeter on the left is watching the B+ for the input triodes and the voltmeters on the right are not used in this setup.

Starting Over

After some years of low external interest and inaction on my part, my wordpress vendor was hacked and from what I can determine, its content was lost. I changed vendors. I didn’t see the need to try and recover old posts, and felt it was as good a time as any to make a fresh start on March 5, 2023.