When I last wrote about the little 8-bit ISA compact-flash card I’d been working on, it was working using the logic and BIOS code of the original XT-IDE adapter with the ‘ChuckMod’, and delivered about 210KB/s read and 110KB/s write in an otherwise stock 5150. There’s long been a background of chatter on the vintage-computer forums about the potential advantage of implementing memory-mapped IO, so I decided to find out more.
Port IO vs Memory-Mapped IO
Most ISA cards use IO ports – at the hardware level, the first 10 address lines (A0..A9) and a pair of triggers (/IOR and /IOW). The timing of these signals is though identical to memory transfers, which use all 20 address lines and separate triggers (/MEMR and /MEMW). So why is there a difference in performance?
With the 8088/8086 CPUs, Intel provided instructions to copy blocks of memory – such as “rep movsw” – but an equivalent for port IO didn’t arrive until the 80286 (also in NEC’s V20). So on a PC or PC-XT, separate load, store and loop instructions are needed, for example:
.TransferLoop in ax, dx ; get 16-bit word from controller stosw ; and store it in memory loop .TransferLoop ; and repeat
That runs at about 200KB/s on a 4.77MHz 8088 (any disk access based on it would be slower, because of the file system and other overheads), but could be improved by ‘unrolling’ a bit, since the loop instruction is slow – this being the code used in the ‘ChuckMod’ BIOS for reads, translating to about 210KB/s in real file-system throughput.
But, if the hardware can be arranged to make the sector data look like a block of memory, this code can be replaced with ‘rep movsw’, which runs at 360KB/s. There’s another advantage too; for machines with 16-bit buses (such as 8086 CPUs in systems with only 8-bit ISA slots), there’s no overhead in gathering the instructions from the 8-bit ROM to execute, leaving the external bus clear for just the IO itself.
To make each sector of data look like a memory block, we don’t need any memory on the card; we just need to respond to the ascending memory address addresses the CPU will be asking for.
That in itself is quite easy – we just ignore the address lines that will be changing – but it’s complicated by three of those lines being used to communicate with the IDE interface already. But since this board is CPLD based, it’s relatively easy to add logic to hold those low during a memory-mapped transfer.
The first hurdle was to speed up writes generally. Although the ChuckMod design makes access to the ports sequential, for writes we need to write to the drive on the second byte, whereas for reads we need to read on the first, hence why write speeds are lower than reads for the ChuckMod design (as it needs byte transfers for writes).
To get past that, I tried to distinguish writes using /IOR and /IOW triggers, but this wasn’t stable since those signals are sent slightly after the address lines, and the IDE interface expects that ordering. We need to know if it’s a read or write at the start, so in my design I use an address line: reads are via port x00h as usual, but writes are to port x10h.
The 8- to 16-bit MUX itself then becomes quite simple, just a single byte latch and some buffers, and a few logic gates to work out which way data is travelling:
And amazingly, it worked! Write speed is nearly doubled with this logic, running at over 200KB/s.
So then to memory-mapped support. I’ve literally added-on the capability, making it possible for the design to work with either port or memory-mapped IO. This gives only a couple of changes to the MUX module but a whole second address decoder section for it (the bottom half):
To distinguish writes on memory-mapped transfers (to get the IDE transfer ordering correct), I couldn’t use A4 as with port based IO, so A9 is used instead, so reads are from Base+0000h and writes are from Base+0200h. One other change is to include a test for A19 on both halves, so only one of the decoders can produce an output.
The universal BIOS developers requested some way of identifying the card, so I added a ‘device ID’ constant at an unused port address (x0Fh). Then came the idea of loading the memory-mapped IO base address into the card via the same port, hence this buffer design, the latch on the left storing the memory-mapped base address:
The design looked like it should work, but fitting it into the CPLD proved difficult. After a bit of reading, some tweaks to the settings finally got it fitted.
Next challenge was the BIOS modifications, the vintage-computer gurus kindly lending a hand. With all of that done, finally the board came to life with speeds comparable to the fastest cards from back in the day:
Port-IO Memory-Mapped IO Reads 250KB/s 315KB/s Writes 205KB/s 265KB/s
Enabling the write-cache on a microdrive I was using for testing, writes increased to 300KB/s.
The BIOS code is very much at the ‘alpha’ stage – I’ve not found any problems with it, but it’s hard-coded to use a transfer window at a base address of D800h.
Better BIOS support, with on-the fly memory-mapped configuration, more system and drive testing, and looking into whether a production run of assembled boards is feasible. None of that is too bad – the project is nearly finished!
After that I have a couple more ideas – to re-visit the idea of a board with both compact flash and 40-pin header, and to port the design to the Tandy 1400FD.
About the XT-CF
The Peacon XT-CF has been designed as a cheap-to-make 8-bit ISA board providing a bootable compact-flash socket to any PC with an ISA slot. The board draws on the basic physical design of the Dangerous Prototypes XT-IDE V2 board, itself inspired by the original XT/IDE project. The board makes use of the interoperability of 3.3V and 5V logic signals to enable the use of low-cost 3.3V surface-mount components, and has been sized to fit within board size limits of Seeedstudio for low-cost production.
Hobbyist home-assembly is entirely possible with a few basic tools and some patience, and PCBs and slot brackets are available now (get in touch here). Everything else you need – parts list, CPLD source, and BIOS are available in the wiki along with a more detailed technical description.