lo-tech Blog page 5

Accelerating the lo-tech XT-CF adapter

When I last wrote about the little 8-bit ISA compact-flash card I’d been working on, it was working using the logic and BIOS code of the original XT-IDE adapter with the ‘ChuckMod’, and delivered about 210KB/s read and 110KB/s write in an otherwise stock 5150.  There’s long been a background of chatter on the vintage-computer forums about the potential advantage of implementing memory-mapped IO, so I decided to find out more.

Port IO vs Memory-Mapped IO

Most ISA cards use IO ports – at the hardware level, the first 10 address lines (A0..A9) and a pair of triggers (/IOR and /IOW).  The timing of these signals is though identical to memory transfers, which use all 20 address lines and separate triggers (/MEMR and /MEMW).  So why is there a difference in performance?

With the 8088/8086 CPUs, Intel provided instructions to copy blocks of memory – such as “rep movsw” – but an equivalent for port IO didn’t arrive until the 80286 (also in NEC’s V20).  So on a PC or PC-XT, separate load, store and loop instructions are needed, for example:

.TransferLoop
   in       ax, dx     ; get 16-bit word from controller
   stosw               ; and store it in memory
   loop .TransferLoop  ; and repeat

That runs at about 200KB/s on a 4.77MHz 8088 (any disk access based on it would be slower, because of the file system and other overheads), but could be improved by ‘unrolling’ a bit, since the loop instruction is slow – this being the code used in the ‘ChuckMod’ BIOS for reads, translating to about 210KB/s in real file-system throughput.

But, if the hardware can be arranged to make the sector data look like a block of memory, this code can be replaced with ‘rep movsw’, which runs at 360KB/s.  There’s another advantage too; for machines with 16-bit buses (such as 8086 CPUs in systems with only 8-bit ISA slots), there’s no overhead in gathering the instructions from the 8-bit ROM to execute, leaving the external bus clear for just the IO itself.

New Logic

To make each sector of data look like a memory block, we don’t need any memory on the card; we just need to respond to the ascending memory address addresses the CPU will be asking for.

That in itself is quite easy – we just ignore the address lines that will be changing – but it’s complicated by three of those lines being used to communicate with the IDE interface already.  But since this board is CPLD based, it’s relatively easy to add logic to hold those low during a memory-mapped transfer.

My Solution

The first hurdle was to speed up writes generally.  Although the ChuckMod design makes access to the ports sequential, for writes we need to write to the drive on the second byte, whereas for reads we need to read on the first, hence why write speeds are lower than reads for the ChuckMod design (as it needs byte transfers for writes).

To get past that, I tried to distinguish writes using /IOR and /IOW triggers, but this wasn’t stable since those signals are sent slightly after the address lines, and the IDE interface expects that ordering.  We need to know if it’s a read or write at the start, so in my design I use an address line: reads are via port x00h as usual, but writes are to port x10h.

The 8- to 16-bit MUX itself then becomes quite simple, just a single byte latch and some buffers, and a few logic gates to work out which way data is travelling:

And amazingly, it worked!  Write speed is nearly doubled with this logic, running at over 200KB/s.

So then to memory-mapped support.  I’ve literally added-on the capability, making it possible for the design to work with either port or memory-mapped IO.  This gives only a couple of changes to the MUX module but a whole second address decoder section for it (the bottom half):

To distinguish writes on memory-mapped transfers (to get the IDE transfer ordering correct), I couldn’t use A4 as with port based IO, so A9 is used instead, so reads are from Base+0000h and writes are from Base+0200h.  One other change is to include a test for A19 on both halves, so only one of the decoders can produce an output.

The universal BIOS developers requested some way of identifying the card, so I added a ‘device ID’ constant at an unused port address (x0Fh).  Then came the idea of loading the memory-mapped IO base address into the card via the same port, hence this buffer design, the latch on the left storing the memory-mapped base address:

The design looked like it should work, but fitting it into the CPLD proved difficult.  After a bit of reading, some tweaks to the settings finally got it fitted.

Next challenge was the BIOS modifications, the vintage-computer gurus kindly lending a hand.  With all of that done, finally the board came to life with speeds comparable to the fastest cards from back in the day:

           Port-IO     Memory-Mapped IO
 Reads     250KB/s        315KB/s
 Writes    205KB/s        265KB/s

Enabling the write-cache on a microdrive I was using for testing, writes increased to 300KB/s.

The BIOS code is very much at the ‘alpha’ stage – I’ve not found any problems with it, but it’s hard-coded to use a transfer window at a base address of D800h.

What’s Next

Better BIOS support, with on-the fly memory-mapped configuration, more system and drive testing, and looking into whether a production run of assembled boards is feasible.  None of that is too bad – the project is nearly finished!

After that I have a couple more ideas – to re-visit the idea of a board with both compact flash and 40-pin header, and to port the design to the Tandy 1400FD.

About the XT-CF

The Peacon XT-CF has been designed as a cheap-to-make 8-bit ISA board providing a bootable compact-flash socket to any PC with an ISA slot.  The board draws on the basic physical design of the Dangerous Prototypes XT-IDE V2 board, itself inspired by the original XT/IDE project.  The board makes use of the interoperability of 3.3V and 5V logic signals to enable the use of low-cost 3.3V surface-mount components, and has been sized to fit within board size limits of Seeedstudio for low-cost production.

Hobbyist home-assembly is entirely possible with a few basic tools and some patience, and PCBs and slot brackets are available now (get in touch here).  Everything else you need – parts list, CPLD source, and BIOS are available in the wiki along with a more detailed technical description.

XT/IDE – The Future

So what’s next for the XT/IDE project?  Besides the few DPv1b and DPv2 boards left, effectively none of the boards are available.  For the DPv2, many hobbyists will be unwilling to tackle SMT boards and being just over 100mm wide, it’s expensive to have made by (eg) Seeedstudio as a one-off.

Andrew Lynch periodically orders a short run of the ‘Mk.II’ XT/IDE board, although currently it looks that all stock is depleted.  Other projects, such as the ‘JR-IDE’ controller for the PCJnr, are ongoing but again have no boards available to buy for an ordinary PC as of right now.

Dangerous Prototypes v2

One motivation behind the Dangerous Prototypes development with SMT was the possibility of a cost-effective short run of finished boards – professionally assembled with the CPLD and EEPROM programmed.  In effect, the first new commercial 8-bit ISA card probably for at least 15 years 🙂

The problem is quantities, as to be viable it needs demand enough for 100 boards or more.  Since there’s probably only been 100 of the original XT/IDE boards ever made, that just seems unlikely.

Peacon XT-CF

Separately, I wanted to re-work the DPv2 design to use a Compact Flash (CF) header instead of a 40-pin header.  A quick survey of VC Forum members showed interest in CF, but only with a 40-pin header too.

Why Compact Flash?

DOS can only access about 8GB, which is way more than could ever be filled with XT class software, yet CF cards that size are currently about £10.  And being solid-state, they have near zero command latency – look at the performance numbers side-by-side with the ST-412 in the ubiquitous PC/XT:

              ST-412    C.F.
Read (KB/s):      64     210
Write (KB/s):     62     110
Sector IOPS:       6      13

And don’t think CF cards top-out at 13 IOPS; that’s just a reflection of the file system processing speed with a 4.77MHz 8088.  The same CF card on the same XT/IDE controller (with ‘chuck-mod’) in a Pentium-200 turns in over 1MB/s and 700 IOPS – compared to about 140 IOPS from a current SATA drive.

Why not SD or SDHC?

The great advantage of CF cards is that they have a ‘true-IDE’ mode, in which they behave like an IDE disk.  This is why CF-to-IDE adapters are so simple; literally they are just a plug adapter with a couple of LEDs thrown in for good measure.  So modifying the DPv2 for CF should be quite straightforward, whereas implementing an SD card controller would require a CPLD with much more capacity and some supporting hardware with it.

Against the Grain

So going completely against the survey, my card as it exists today has only a CF slot, mounted so the CF card is accessible through the expansion slot bracket (but not hot-pluggable).  Originally based on the DPv2, there’s now little left in the design from it other than the CPLD choice (the Xilinx XC9572XL).  By replacing the EEPROM with a flash chip and revising the logic design, it should be quite a bit cheaper than the DPv2.

But as ever, there’s a problem: it needs a custom ISA bracket.  Laser cutting and punch-press are options, but only cost effective at 50 units+.  But for prototyping, I’ve managed to get one made by a friend.

The first challenge is revising Pietja’s CPLD code for my board; since the CF header pinout is ‘muddled up’ compared to the IDE 40-pin header, it made sense to just adjust the CPLD pinout to match, rather than mess about with complex PCB trace routing.  The Xilinx CPLD coding is developed through Xilinx ISE, which helpfully they provide for free (after registration).

The next problem is programming the SST39SFxxx flash chip.  It’s byte-programmable but with a software data protection scheme (which prevents bad code elsewhere over-writing the contents) that isn’t supported by the XTIDE universal BIOS utility, flashing this card will be a two-step process: configuring and saving the ROM image with the xt-ide universal BIOS configuration utility, then flashing the card using my own simple flash utility.

The First Cut…

So the prototype board has been built, the bracket pressed, the flashing utility compiled.  After quite a learning curve with ISE… it works!

So far I’ve only adjusted Pietja’s DPv2 CPLD code for it (changed the pinout and ROM decode), and this is the original XT/IDE design, but it’s a start.  The next step is to generate CPLD code for my revised circuit design that I hope will get writes going as fast as reads do with the ‘chuck-mod’ – but that’s all a job for another day.

Free PCBs

Interested in building one of these?  All the parts can be sourced from Farnell, and the resources you need are right here – some notes on SMT soldering and the XT-CF wiki page through which you can find CPLD code, parts lists and the flashing utility.  You’ll need some way to program the CPLD too (via the JTAG header).

I’ve got a few of these blank PCBs to give away, so leave your email address in a comment (won’t be visible to others) and I’ll send you one for nothing!

From Idea to Reality – CPLD based XT/IDE Controller

ATA (IDE) in most forms is a 16-bit protocol, so an IDE (or ATA/SATA) disk won’t work in an old 8-bit machine like an IBM PC/XT.  8-bit IDE cards did exist ‘back in the day’, but these worked with special 8-bit disks, and sourcing these together now will be prohibitively expensive – if you can find them in working condition that is.

Enter then Andrew Lynch’s now ubiqutous XT/IDE project – a home-assembly 8-bit ISA card that enables any current IDE (or, by means of a simple adapter, compact flash, SD-card or SATA disk) to be used in an 8-bit micro.  And there’s a selection of ROM images available too, so all but the earliest 5150 can boot up directly from a modern storage device.

Original XT-IDE Controller (photo: hackaday.com)

The card uses a 28C64 EEPROM for the BIOS and discrete 74LS series logic chips to divide the 16-bit data from the disk into two 8-bit bytes for the old XT bus.  And, it could be built at home with just some basic soldering skills.  But stock of  these boards is long gone now.

Dangerous Prototypes XT/IDE board v1

Ian at Dangerous Prototypes spotted more potential in the project.  By using a ‘CPLD’, many or maybe even all the 74LS chips could be eliminated, instead the circuit design being implemented in software and literally uploaded into the CPLD through JTAG.  In this way, the board could be made smaller and less expensive – and changes to the logic design, for example the ‘chuck-mod‘ to improve read speed, could be made without redesigning the board or cutting traces.

Ians designs use mostly surface-mount components, which opens up potential for a production run of the boards, and SMT components are generally cheaper than through-hole parts – but many home assemblers will be put off.  Actually assembling them isn’t too bad, with some basic equipment.

So the Dangerous Prototypes XT/IDE board v1 (DPv1) was born, but after a false start with incorrect cut dimensions, the design sat on the shelf for a year.  When I spotted a link on vintage-computer wiki, I just had to get involved.

Dangerous Prototypes XT/IDE board v1a

Ian’s first design used a 5V CPLD, but this part was discontinued (along with many 5V parts) as designers move to faster, cheaper and cooler 3.3V logic.  Could the old PCs we need this to work with interface with 3.3V logic?  No-one knew.

With a PC/XT in my workshop and an oscilloscope I’d inhereted some years before still waiting for it’s first project, this seemed like a great chance to learn how to use that and hopefully get this moving.  By adding some test points to an original XT/IDE board, the signals in the PC/XT could easily be analysed – and some were already near 3.3V it transpired.  In any case, 3.3V logic levels have been designed to be compatible with 5V TTL.

After some discussion with Ian, he thought it was worth a shot.  As it happens, much later it transpired that ATA-6 disks (c.2001 and newer) themselves use 3.3V singalling too.

So the board was revised with the 3.3V ‘XL’ CPLD – the Dangerous Prototypes XT/IDE board v1a (DPv1a) – basically by adding a tiny regulator.  A few were made, Dangerous Prototypes forum member Pietja and I assembling and testing them.  Pietja quickly found some missing traces in the design, which were corrected with jumper wires – my prototype board being a shameful mess compared to Pietja’s:

XT-IDE V1a Prototypes

But either way, the boards worked!  A couple of drive compatibility problems were resolved by changing the cable-select termination (actually, a minor deviaton from the ATA spec inherited from the original XT/IDE board) and with that fixed, we just couldn’t break it – every computer and drive we tried worked.  Pietja produced CPLD code for both the original circuit design and the ‘chuck-mod’ version, which with the appropriate BIOS all worked just fine.

Performance wise of course it’s identical to the original board, throughput in a PC/XT with the Chuck-Mod being about 115KB/s write and 230KB/s read.  It easily out-performs the original ST506 and indeed later RLL drives like the ST-251.

Dangerous Prototypes XT/IDE board v1b

So with the board proven, the missing traces in the V1a design were corrected and a small run of V1b boards produced.

Although still with the CSEL problem, this is easy to correct at assembly by jumpering pin 28 to pin 30 on the underside of the header.  With that change, the board is reasonably cheap, easy to assemble and works perfectly.

Assembled XT-IDE V1b Board

I have a few of these available to buy right now – see the bottom of the page.

Dangerous Prototypes XT/IDE board v2

But Ian wasn’t happy with that – it still needed four 74LS series ICs as the CPLD didn’t have enough pins to accomodate the entire design.  By using a 100-pin CPLD, all the 74 series chips could be eliminated.  So the Dangerous Prototypes XT/IDE board v2 (DPv2):

Dangerous Prototypes XT-IDE v2 Board

With so few components it’s cheap to make – by far the most expensive part being the 28C64 EEPROM.

After a bit of CPLD development, Pietja again came up with the goods and produced a working port of the original XT/IDE circuit logic for the board.  With it, his board came to life with no problems found in initial testing.

So we now have,

I’ve put some parts lists up on the wiki here and some notes on SMT soldering.

Whilst this was all happening, the creators of the original XT/IDE board have produced a Version 2, which adds some very cool capabailities based around booting from disk images via a serial lead, but it seems the boards made have now all been sold.  It was also quite expensive to make, especially with the high-speed serial UART.

Assembled DPv1b Boards Available Now!

I have a few assembled and tested a few DPV1 boards and am making these available for purchase right now:

  • CLPD has been programmed with the ‘Chuck-mod’ design
  • M28C64 EEPROM included, programmed with ‘Chuck-mod’ XTIDE BIOS 2.0-beta

Please see the vintage-computer forums for sale section – I’m making available two v1b boards and my original prototype v1a too.  All three are fully tested, although the prototype is physically fragile.  Proceeds will be fed straight back into the development of the compact flash board I’m currently working on.

Note: The Keystone 9202 ISA bracket is not included, since stock in the UK is non-existant.  These can be obtained from DigiKey in the US.

Coming Soon: XT/IDE – The Future

Config.sys BUFFERS – What Are They?

The CONFIG.SYS BUFFERS statement appeared with MS-DOS 2, which amongst other big changes added fixed disk support.  It also made two changes to the way that DOS handled IO:

  • The file allocation table (FAT) was no-longer always held in memory, instead being treated like any other sector; and
  • Instead of a single DOS disk sector buffer, the BUFFERS CONFIG.SYS statement enabled the user to select between 1 and 99 sector buffers.

Since previously (in DOS 1.x) the FAT had been held entirely in memory, no disk IO was needed to find the sectors being requested by open file IO.  But with DOS 2, this could need serveral IOs – Tim Paterson, an original architect of MS-DOS, published a superb article describing the details in Byte Magazine in 1983, now available here (and cached here).

But why does this matter?

In testing old disks with my simple DiskTest utility, I couldn’t help but notice a wealth of full-stroke IO going on, especially with the 40MB WD-384R since it’s stepper is so vocal.  The reason is clear: with FAT-16 (DOS 4), the FAT is over 80KB (a list of 40,000 clusters each 16-bits wide), and with each buffer being 512-bytes, there simply isn’t enough buffer space for the FAT.  So, as the random IO test runs then buffers containing FAT data will be victimised by file data on its way through, and hence DOS needs to seek back to the FAT.

This lead to what is approaching full-stoke IO in this case as the drive happened to be nearly full when I ran the test, so the majority (or all) of the test file was near the opposite end of the drive to the FAT.

Tuning

But this leads to a tuning opportunity for the PC/XT disk system, particularly for random IO applications such as databases.

Dividing the drive(s) into smaller partitions doesn’t really help, since DOS selects a cluster size so that the FAT (on a FAT-16 volume) tends to be between 64 and 128KB – so with 99 buffers available (49.5KB), the FAT will never completely fit in the buffer space.  And in any case, with a maximum of only 640KB available on the PC/XT, there simply isn’t much space for a disk cache (each buffer consumes 528 bytes of RAM).

Here’s the difference that buffers makes to the WD-384R under DOS 4 with a single 40MB FAT-16 partition:

For random IO,  the system is performing at over five times the rate with 99 buffers, than with a single buffer, and 40% better than with 16 buffers.

For a  PC/XT, with its ST-406 drive, DOS 2, and a single 10MB FAT-12 partition, the impacts of buffers is rather different – as the FAT is only about 4KB, so it can fit comfortably in just 8 buffers:

This all leads me to a few conclusions:

  • For FAT-16, use 99 buffers, unless the memory is really needed by programs
  • if space permits (up to 16MB), use a FAT-12 partition for database files since less buffers are needed for optimum performance
  • on later (AT) machines with even a little XMS, a disk caching utility such as SmartDrv should offer significant further gains

Western Digital WD-384R, or is it a Tandon Drive?

Cracking open the Amstrad PC2286, I was surprised to find a Western Digital WD-384R RLL 40MB disk as the few references I can find refer to Seagate disks.  Most likely the original disk was replaced, but an unexpected gem to find the WD disk.  There’s a good history of disks at http://redhill.net.au/d/d-a.html, but basically Western Digital, then a controller manufacturer, bought Tandon to get a disk division to develop drives with integrated controllers – what would ultimately become IDE drives.  Tandons disks were simply re-stickered as Western Digital immediately after the takeover, in this case the TM364 becoming the WD-384.

3.5″ Form-Factor

The TM364’s 20MB 2-head sister, the TM262, was one of the first 3.5″ form-factor drives.  It was a pretty reliable classic RLL stepper-motor drive with somewhat relaxed performance – the specifications quoted 85ms average seek, but measured today with Norton Calibrate in reality even that is somewhat optimistic.  But this unit is still running well 22 years on, which says a lot about its quality.

Bad Sectors and Interleave

Bad sectors were a reality of 1980’s hard drives – internal relocation hadn’t been thought of, and the drives had a handwritten list of known bad sectors on the label.  Once low-level and high-level formatted, an amount of bad sectors would almost always be present, the Amstrad handbook stating 1% of drive capacity as an acceptable range.  With about 100K of bad sectors, this drive is still performing within spec when new today.

Another complication was the interleave, which spaces out sectors so that the CPU has a minimal wait between sectors on sequential operations.  This disk, when operating through the Amstrad PC2286 with it’s 80286 CPU and Western Digital 1006 controller, supports an interleave of 1:1 meaning that an entire track can be read in one disk RPM.

Performance

Hooking up the drive to Intel’s IOMeter was possible by using DOS Networking (LanManger) to share the volume to a Windows 2003 Server, but the performance was awful.  The share performance gave about 60KB/s and 2 IOPS only.  LanManager client performs better – achiving over 20 IOPS from a network share.

Because of this, I devised a simple DOS app to do the same types of tests, DiskTest, which simply performs the three core tests (32K sequential read and write, and 8K random) with a 4MB test file and displays the results.

When testing the random workload, the limitations of the DOS FAT file system were quickly evident, with much slower throughput that I was expecting at just 3 IOPS.  The reason was that DOS kept having to seek to the FAT to find the blocks, and since the FAT(s) are at one end of the disk, this created heavy full-stroke IO load on the disk as the test file happened to be near the other end.  Increasing buffers to 99 solved this, as would installing SMARTDRV with even a small read cache.  With 99 buffers, the results are:

  • 32K Sequential Read – 449KB/s
  • 32K Sequential Write – 282KB/s
  • 8K Random 70% read – 7.1 IOPS

Only 7 IOPS still seems low at first sight, but at these transfer rates and with 70% read, this drive takes on average 21 ms just to transfer the 8K, plus the 105ms to find it, and at 3,600 RPM another 8 ms for the data to arrive – on average, 134 ms.  At 7.1 IOPS, the difference (6 ms on average) is probably then because some of the operations could span two tracks, adding another 15ms, and the odd seek to the file allocation table.

The performance with real apps really does feel lethargic with this drive, but adding a small cache with buffers=99 or SMARTDRV makes a huge difference and software from the time, such as Windows 2, then chug along at a usable pace.