When I first started working, I was very interested in learning about the Linux kernel. Specifically the TCP/IP stack and the inner workings of an Ethernet controller. To learn more, I picked one of the most widely used Ethernet controllers at that time, the Intel 82559 10/100 Fast Ethernet Controller and one of its open-source drivers, the eepro100.
Thought 82559 chip is no longer manufactured and the eepro driver has been deprecated, this article will still serve as a guide for how an ethernet controller works. We’ll look at how the eepro100 driver interfaces with the Intel 82559 chipset and how the 82559 converts the packets sent by the driver to signals transmitted over physical ethernet cable.
The mighty 82559
The 82559 is an Intel ethernet chipset. It supports 10/100 Mbps full duplex data communication over a pair of wires. This is a high level block diagram of the 82559.
The most important subsystems of 82559 are:
- A parallel subsystem (shown in green).
- A FIFO subsystem (shown in red).
- The 10/100 Mbps Carrier Sense Multiple Access with Collision Detect (CSMA/CD) unit (shown in blue).
- The 10/100 Mbps physical layer (PHY) unit (shown in black).
The parallel subsystem
The parallel subsystem is responsible for interfacing the chipset with the motherboard via the PCI bus. It also controls and executes all the chipset’s functions via a Micro-Machine.
As a PCI device, 82559 can operate in two modes:
- As a PCI target (slave mode). In slave mode, 82559 is completely controlled by the host CPU. The CPU initiates all transmit and receive actions when 82559 is in slave mode.
- For processing the transmit and receive frames, the 82559 operates as a master on the PCI bus. It needs no help from the host CPU to read/write memory or other resources and can work independently.
The micromachine is an embedded processing unit. Instructions for carrying out all the 82559’s functions are embedded in a microcode ROM within the micromachine. The micromachine is divided into two units:
- Receive Unit (RU)
- Command Unit (CU). The CU is also the transmit unit.
These two units operate independently and concurrently. Control is switched between the two units according to the microcode instruction flow. The independence of the Receive and Command units in the micromachine allows the 82559 to execute commands and receive incoming frames simultaneously, with no real-time CPU intervention.
The 82559 also interfaces with an external Flash memory and an external serial EEPROM. The Flash memory may be used for remote boot functions, network statistics, diagnostics and management functions. The EEPROM is used to store relevant information for a LAN connection such as node address (MAC Address), as well as board manufacturing and configuration information.
FIFO Subsystem
The 82559 FIFO (First In, First Out) subsystem consists of a 3 Kbyte transmit FIFO and 3 Kbyte receive FIFO. Each FIFO is unidirectional and independent of the other. The FIFO subsystem serves as the interface between the 82559 parallel side and the serial CSMA/CD unit. It provides a temporary buffer storage area for frames as they are either being received or transmitted by the 82559. Transmit frames can be queued within the transmit FIFO, allowing back-to-back transmission within the minimum Interframe Spacing (IFS). Transmissions resulting in errors (collision detection or data underruns) are re-transmitted directly from the 82559 FIFO eliminating the need to re-access this data from the host system.
CSMA/CD unit
The CSMA/CD unit of the 82559 allows it to be connected to either a 10 or 100 Mbps Ethernet network. The CSMA/CD unit performs all of the functions of the 802.3 protocol such as frame formatting, frame stripping, collision handling, deferral to link traffic, etc. The CSMA/CD unit can also be placed in a full duplex mode which allows simultaneous transmission and reception of frames.
Physical Unit (PHY)
The Physical Layer (PHY) unit of the 82559 is where the digital data is converted to a signal that can propagate over the network wires. To make the actual connection to the network, additional components such as transformers and impedances are needed. This additional components are external to 82559.
Accessing 82559 as a PCI device
A PCI peripheral boards can be accessed using three different address spaces: memory locations, I/O ports, and configuration registers.
- The memory and I/O port address space is shared by all devices on a PCI bus (i.e., when you access a memory location, all the devices see the bus cycle at the same time). A driver can read memory and I/O regions via inb, readb, and so forth.
- The configuration space, on the other hand, exploits geographical addressing. i.e. each PCI slot is uniquely addressed (by a 16 bit address), thus eliminating collisions when configuring devices. To access the configuration space of 82559, the full configuration address (bus, slot, function, offset) is written to an I/O port (for 82559, CONFIG_ADDRESS = 0xCF8) and then the 32-bit word at this address can be read or written through another location (for 82559, CONFIG_DATA = 0xCFC ).
After a PCI device is powered on, the hardware remains in an inactive state and the will only respond to configuration transactions. This is because, at power on, the device does not have its memory and I/O ports mapped into the computer’s address space. Every other device-specific feature, such as interrupt reporting, is disabled as well.
After power on, the BIOS must first scan the PCI bus to determine what PCI device exists and what configuration requirements they have. In order to facilitate this process, all PCI devices, including 82559, must implement a base set of configuration registers as defined by the PCe standard. Registers defined by 82559 is shown in the figure below.
The BIOS reads the Vendor ID, Device ID and Class registers in order to detect the device and its type. 82559 being an Intel device, returns a hard-coded 8086H for Device ID.
Memory & IO Mapping the 82559 device.
Having detected 82559, the BIOS then accesses 82559’s base address configuration registers to determine how many blocks of memory and/or IO space the device requires. Base Address Register (BAR) is 32 bits wide and there can be upto 6 BARs per device. 82559 defines 3 types of BARs, the Control/Status Registers (CSR), Flash, and Expansion ROM as shown in figure above.
Bit zero in all base registers is read only and is used to determine whether the register maps into memory (0) or I/O space (1). Figure above shows the layout of a BAR for memory mapping.
The 82559 contains three BARs, two requesting memory mapped resources and one requesting IO mapping. Specifically, Control and Status Register (CSR) is both Memory Mapped (CSR Memory mapped base address register: 10H) and IO mapped (CSR I/O Mapped Base Address Register: 14H) to anywhere within the 32-bit memory address space. It is up to the driver (eepro100) to determine which BAR (I/O or Memory) to use to access the 82559 Control/Status registers. The size of the memory space is 4Kb and that of I/O space is 32 bytes. The 82559 also requires one BAR (Flash Memory Mapped Base Address Register: 18H) to map accesses to an optional FLASH memory.
After determining the types of mapping and amount of memory/IO space requested from the BARs, BIOS maps the I/O and memory controllers into available memory locations and proceeds with system boot.
The Kernel PCI Initialization
As described above, for Intel based systems, the system BIOS which ran at boot time has already fully configured the PCI system. This leaves Linux kernel with little to do other than remap that configuration.
The PCI device driver (pci.c) starts by scanning PCI Buses and creates a pci_dev for every device (including PCI-to-PCI bridges) and pci_bus for every bus it finds. These structures are linked together into a tree that mimics the actual PCI topology.
At this stage, the BIOS has recognized the 82559 and configured its PCI configuration space assigning it unique memory and IO space and the Linux kernel has created a pci_dev data structure defining 82559.
82559 PCI Initialization
When the eepro100 driver module is loaded into the kernel, the driver registers itself as a PCI driver by calling pci_register_driver(). Implicitly passed to pci_register_driver() is a table of all supported devices (eepro100_pci_tbl). This table lists all the chipsets this driver is capable of driving: 82557, 82558, 82559, 82801, etc.
2371
2372 static struct pci_device_id eepro100_pci_tbl[] __devinitdata = {
2373 { PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_82557,
2374 PCI_ANY_ID, PCI_ANY_ID, },
2375 { PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_82559ER,
2376 PCI_ANY_ID, PCI_ANY_ID, },
2377 { PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_82801BA_7,
2378 PCI_ANY_ID, PCI_ANY_ID, },
2379 { PCI_VENDOR_ID_INTEL, 0x1029, PCI_ANY_ID, PCI_ANY_ID, },
2380 { PCI_VENDOR_ID_INTEL, 0x1030, PCI_ANY_ID, PCI_ANY_ID, },
2381 { PCI_VENDOR_ID_INTEL, 0x1031, PCI_ANY_ID, PCI_ANY_ID, },
2382 { PCI_VENDOR_ID_INTEL, 0x1032, PCI_ANY_ID, PCI_ANY_ID, },
2383 { PCI_VENDOR_ID_INTEL, 0x1033, PCI_ANY_ID, PCI_ANY_ID, },
2384 { PCI_VENDOR_ID_INTEL, 0x1034, PCI_ANY_ID, PCI_ANY_ID, },
2385 { PCI_VENDOR_ID_INTEL, 0x1035, PCI_ANY_ID, PCI_ANY_ID, },
2386 { PCI_VENDOR_ID_INTEL, 0x1036, PCI_ANY_ID, PCI_ANY_ID, },
2387 { PCI_VENDOR_ID_INTEL, 0x1037, PCI_ANY_ID, PCI_ANY_ID, },
2388 { PCI_VENDOR_ID_INTEL, 0x1038, PCI_ANY_ID, PCI_ANY_ID, },
2389 { PCI_VENDOR_ID_INTEL, 0x1039, PCI_ANY_ID, PCI_ANY_ID, },
2390 { PCI_VENDOR_ID_INTEL, 0x103A, PCI_ANY_ID, PCI_ANY_ID, },
2391 { PCI_VENDOR_ID_INTEL, 0x103B, PCI_ANY_ID, PCI_ANY_ID, },
2392 { PCI_VENDOR_ID_INTEL, 0x103C, PCI_ANY_ID, PCI_ANY_ID, },
2393 { PCI_VENDOR_ID_INTEL, 0x103D, PCI_ANY_ID, PCI_ANY_ID, },
2394 { PCI_VENDOR_ID_INTEL, 0x103E, PCI_ANY_ID, PCI_ANY_ID, },
2395 { PCI_VENDOR_ID_INTEL, 0x1050, PCI_ANY_ID, PCI_ANY_ID, },
2396 { PCI_VENDOR_ID_INTEL, 0x1059, PCI_ANY_ID, PCI_ANY_ID, },
2397 { PCI_VENDOR_ID_INTEL, 0x1227, PCI_ANY_ID, PCI_ANY_ID, },
2398 { PCI_VENDOR_ID_INTEL, 0x1228, PCI_ANY_ID, PCI_ANY_ID, },
2399 { PCI_VENDOR_ID_INTEL, 0x2449, PCI_ANY_ID, PCI_ANY_ID, },
2400 { PCI_VENDOR_ID_INTEL, 0x2459, PCI_ANY_ID, PCI_ANY_ID, },
2401 { PCI_VENDOR_ID_INTEL, 0x245D, PCI_ANY_ID, PCI_ANY_ID, },
2402 { PCI_VENDOR_ID_INTEL, 0x5200, PCI_ANY_ID, PCI_ANY_ID, },
2403 { PCI_VENDOR_ID_INTEL, 0x5201, PCI_ANY_ID, PCI_ANY_ID, },
2404 { 0,}
2405 };
2406 MODULE_DEVICE_TABLE(pci, eepro100_pci_tbl);
After registering the driver, pci_register_driver() probes [pci_device_probe()] the PCI device tree for all unclaimed PCI devices. Chances are that one of those devices could be an eepro100 compatible device. When an unclaimed device is found, pci_bus_match() and pci_match_device() are called to check if this unclaimed device is an eepro100 compliant PCI device.
pci_match_device() checks if the PCI_VENDOR_ID, PCI_DEVICE_ID, PCI_SUBVENDOR_ID, PCI_SUBDEVICE_ID from the eepro100_pci_tbl and the device configure header values match (see 82559 configuration space figure above). If a match is found, eepro100’s probe function, eepro100_init_one(), is called to reset/probe the new device. After the probe is complete, the device is marked as claimed by eepro100.
579 static int __devinit eepro100_init_one (struct pci_dev *pdev,
580 const struct pci_device_id *ent)
581 {
582 unsigned long ioaddr;
583 int irq;
584 int acpi_idle_state = 0, pm;
585 static int cards_found /* = 0 */;
586
587 #ifndef MODULE
588 /* when built-in, we only print version if device is found */
589 static int did_version;
590 if (did_version++ == 0)
591 printk(version);
592 #endif
593
594 /* save power state before pci_enable_device overwrites it */
595 pm = pci_find_capability(pdev, PCI_CAP_ID_PM);
596 if (pm) {
597 u16 pwr_command;
598 pci_read_config_word(pdev, pm + PCI_PM_CTRL, &pwr_command);
599 acpi_idle_state = pwr_command & PCI_PM_CTRL_STATE_MASK;
600 }
601
602 if (pci_enable_device(pdev))
603 goto err_out_free_mmio_region;
604
605 pci_set_master(pdev);
606
607 if (!request_region(pci_resource_start(pdev, 1),
608 pci_resource_len(pdev, 1), "eepro100")) {
609 printk (KERN_ERR "eepro100: cannot reserve I/O ports\n");
610 goto err_out_none;
611 }
612 if (!request_mem_region(pci_resource_start(pdev, 0),
613 pci_resource_len(pdev, 0), "eepro100")) {
614 printk (KERN_ERR "eepro100: cannot reserve MMIO region\n");
615 goto err_out_free_pio_region;
616 }
617
618 irq = pdev->irq;
619 #ifdef USE_IO
620 ioaddr = pci_resource_start(pdev, 1);
621 if (DEBUG & NETIF_MSG_PROBE)
622 printk("Found Intel i82557 PCI Speedo at I/O %#lx, IRQ %d.\n",
623 ioaddr, irq);
624 #else
625 ioaddr = (unsigned long)ioremap(pci_resource_start(pdev, 0),
626 pci_resource_len(pdev, 0));
627 if (!ioaddr) {
628 printk (KERN_ERR "eepro100: cannot remap MMIO region %lx @ %lx\n",
629 pci_resource_len(pdev, 0), pci_resource_start(pdev, 0));
630 goto err_out_free_mmio_region;
631 }
632 if (DEBUG & NETIF_MSG_PROBE)
633 printk("Found Intel i82557 PCI Speedo, MMIO at %#lx, IRQ %d.\n",
634 pci_resource_start(pdev, 0), irq);
635 #endif
636
637
638 if (speedo_found1(pdev, ioaddr, cards_found, acpi_idle_state) == 0)
639 cards_found++;
640 else
641 goto err_out_iounmap;
642
643 return 0;
644
645 err_out_iounmap: ;
646 #ifndef USE_IO
647 iounmap ((void *)ioaddr);
648 #endif
649 err_out_free_mmio_region:
650 release_mem_region(pci_resource_start(pdev, 0), pci_resource_len(pdev, 0));
651 err_out_free_pio_region:
652 release_region(pci_resource_start(pdev, 1), pci_resource_len(pdev, 1));
653 err_out_none:
654 return -ENODEV;
655 }
pci_find_capability(): Every device that supports PCI power management, including 82559, has an 8 byte capability field in its PCI configuration space (See address DCh – Eoh in the figure above). This field is used to describe and control the standard PCI power management features. The PCI PM spec defines 4 operating states for devices, D0 – D3. The higher the number, less power the device consumes but longer is the latency for the device to return to the operational state (D0). 82559 supports all 4 power states. pci_enable_device() (via pci_set_power_state()) activates 82559 by switching it to the D0 state.
pci_set_master(): If the device has bus mastering capability, during bootup the BIOS can read two of its configuration registers (Minimum Grant register: Min_Gnt and Maximum Latency register: Max_Lat, see configuration registers in the above figure) to determine how quickly it requires access to the PCI bus when it asserts REQ# pin and the average duration of its transfer when it has acquired ownership of the bus. The BIOS can utilize this information to program the bus master’s latency timer register and the PCI bus arbiter to provide the optimum PCI bus utilization. For 82559, the default value of Minimum Grant Register is 08H and Maximum Latency Register is 18H. pci_set_master() is called to enable 82559 to act as a bus master.
During boot, the BIOS had allocated a range of unique memory and IO regions for accessing 82559’s configuration space. For the driver to use these regions, they have to be reserved and locked in the kernel by marking those regions as BUSY (to prevent other drivers from accessing these same regions). eepro100_init_one() locks the PCI BIOS assigned IO port regions using request_region() (cat /proc/ioports to see a list of all locked IO ports). Similarly to reserve memory mapped regions request_mem_region() is called. This is done for all the 3 regions pointed to by the 3 active 82559 BARs.
The 82559 is now physically enabled and is ready to start receiving and transmitting ethernet frames. The driver now prepares the kernel to start using this device for network access via speedo_found1().
657 static int __devinit speedo_found1(struct pci_dev *pdev,
658 long ioaddr, int card_idx, int acpi_idle_state)
659 {
660 struct net_device *dev;
661 struct speedo_private *sp;
662 const char *product;
663 int i, option;
664 u16 eeprom[0x100];
665 int size;
666 void *tx_ring_space;
667 dma_addr_t tx_ring_dma;
668
669 size = TX_RING_SIZE * sizeof(struct TxFD) + sizeof(struct speedo_stats);
670 tx_ring_space = pci_alloc_consistent(pdev, size, &tx_ring_dma);
671 if (tx_ring_space == NULL)
672 return -1;
673
674 dev = init_etherdev(NULL, sizeof(struct speedo_private));
675 if (dev == NULL) {
676 printk(KERN_ERR "eepro100: Could not allocate ethernet device.\n");
677 pci_free_consistent(pdev, size, tx_ring_space, tx_ring_dma);
678 return -1;
679 }
680
681 SET_MODULE_OWNER(dev);
682
683 if (dev->mem_start > 0)
684 option = dev->mem_start;
685 else if (card_idx >= 0 && options[card_idx] >= 0)
686 option = options[card_idx];
687 else
688 option = 0;
689
690 /* Read the station address EEPROM before doing the reset.
691 Nominally his should even be done before accepting the device, but
692 then we wouldn't have a device name with which to report the error.
693 The size test is for 6 bit vs. 8 bit address serial EEPROMs.
694 */
695 {
696 unsigned long iobase;
697 int read_cmd, ee_size;
698 u16 sum;
699 int j;
700
701 /* Use IO only to avoid postponed writes and satisfy EEPROM timing
702 requirements. */
703 iobase = pci_resource_start(pdev, 1);
704 if ((do_eeprom_cmd(iobase, EE_READ_CMD << 24, 27) & 0xffe0000)
705 == 0xffe0000) {
706 ee_size = 0x100;
707 read_cmd = EE_READ_CMD << 24;
708 } else {
709 ee_size = 0x40;
710 read_cmd = EE_READ_CMD << 22;
711 }
712
713 for (j = 0, i = 0, sum = 0; i < ee_size; i++) {
714 u16 value = do_eeprom_cmd(iobase, read_cmd | (i << 16), 27);
715 eeprom[i] = value;
716 sum += value;
717 if (i < 3) {
718 dev->dev_addr[j++] = value;
719 dev->dev_addr[j++] = value >> 8;
720 }
721 }
722 if (sum != 0xBABA)
723 printk(KERN_WARNING "%s: Invalid EEPROM checksum %#4.4x, "
724 "check settings before activating this device!\n",
725 dev->name, sum);
726 /* Don't unregister_netdev(dev); as the EEPro may actually be
727 usable, especially if the MAC address is set later.
728 On the other hand, it may be unusable if MDI data is corrupted. */
729 }
730
731 /* Reset the chip: stop Tx and Rx processes and clear counters.
732 This takes less than 10usec and will easily finish before the next
733 action. */
734 outl(PortReset, ioaddr + SCBPort);
735 inl(ioaddr + SCBPort);
736 udelay(10);
737
738 if (eeprom[3] & 0x0100)
739 product = "OEM i82557/i82558 10/100 Ethernet";
740 else
741 product = pdev->name;
742
743 printk(KERN_INFO "%s: %s, ", dev->name, product);
744
745 for (i = 0; i < 5; i++)
746 printk("%2.2X:", dev->dev_addr[i]);
747 printk("%2.2X, ", dev->dev_addr[i]);
748 #ifdef USE_IO
749 printk("I/O at %#3lx, ", ioaddr);
750 #endif
751 printk("IRQ %d.\n", pdev->irq);
752
753 /* we must initialize base_addr early, for mdio_{read,write} */
754 dev->base_addr = ioaddr;
755
756 #if 1 || defined(kernel_bloat)
757 /* OK, this is pure kernel bloat. I don't like it when other drivers
758 waste non-pageable kernel space to emit similar messages, but I need
759 them for bug reports. */
760 {
761 const char *connectors[] = {" RJ45", " BNC", " AUI", " MII"};
762 /* The self-test results must be paragraph aligned. */
763 volatile s32 *self_test_results;
764 int boguscnt = 16000; /* Timeout for set-test. */
765 if ((eeprom[3] & 0x03) != 0x03)
766 printk(KERN_INFO " Receiver lock-up bug exists -- enabling"
767 " work-around.\n");
768 printk(KERN_INFO " Board assembly %4.4x%2.2x-%3.3d, Physical"
769 " connectors present:",
770 eeprom[8], eeprom[9]>>8, eeprom[9] & 0xff);
771 for (i = 0; i < 4; i++)
772 if (eeprom[5] & (1<<i))
773 printk(connectors[i]);
774 printk("\n"KERN_INFO" Primary interface chip %s PHY #%d.\n",
775 phys[(eeprom[6]>>8)&15], eeprom[6] & 0x1f);
776 if (eeprom[7] & 0x0700)
777 printk(KERN_INFO " Secondary interface chip %s.\n",
778 phys[(eeprom[7]>>8)&7]);
779 if (((eeprom[6]>>8) & 0x3f) == DP83840
780 || ((eeprom[6]>>8) & 0x3f) == DP83840A) {
781 int mdi_reg23 = mdio_read(dev, eeprom[6] & 0x1f, 23) | 0x0422;
782 if (congenb)
783 mdi_reg23 |= 0x0100;
784 printk(KERN_INFO" DP83840 specific setup, setting register 23 to %4.4x.\n",
785 mdi_reg23);
786 mdio_write(dev, eeprom[6] & 0x1f, 23, mdi_reg23);
787 }
788 if ((option >= 0) && (option & 0x70)) {
789 printk(KERN_INFO " Forcing %dMbs %s-duplex operation.\n",
790 (option & 0x20 ? 100 : 10),
791 (option & 0x10 ? "full" : "half"));
792 mdio_write(dev, eeprom[6] & 0x1f, MII_BMCR,
793 ((option & 0x20) ? 0x2000 : 0) | /* 100mbps? */
794 ((option & 0x10) ? 0x0100 : 0)); /* Full duplex? */
795 }
796
797 /* Perform a system self-test. */
798 self_test_results = (s32*) ((((long) tx_ring_space) + 15) & ~0xf);
799 self_test_results[0] = 0;
800 self_test_results[1] = -1;
801 outl(tx_ring_dma | PortSelfTest, ioaddr + SCBPort);
802 do {
803 udelay(10);
804 } while (self_test_results[1] == -1 && --boguscnt >= 0);
805
806 if (boguscnt < 0) { /* Test optimized out. */
807 printk(KERN_ERR "Self test failed, status %8.8x:\n"
808 KERN_ERR " Failure to initialize the i82557.\n"
809 KERN_ERR " Verify that the card is a bus-master"
810 " capable slot.\n",
811 self_test_results[1]);
812 } else
813 printk(KERN_INFO " General self-test: %s.\n"
814 KERN_INFO " Serial sub-system self-test: %s.\n"
815 KERN_INFO " Internal registers self-test: %s.\n"
816 KERN_INFO " ROM checksum self-test: %s (%#8.8x).\n",
817 self_test_results[1] & 0x1000 ? "failed" : "passed",
818 self_test_results[1] & 0x0020 ? "failed" : "passed",
819 self_test_results[1] & 0x0008 ? "failed" : "passed",
820 self_test_results[1] & 0x0004 ? "failed" : "passed",
821 self_test_results[0]);
822 }
823 #endif /* kernel_bloat */
824
825 outl(PortReset, ioaddr + SCBPort);
826 inl(ioaddr + SCBPort);
827 udelay(10);
828
829 /* Return the chip to its original power state. */
830 pci_set_power_state(pdev, acpi_idle_state);
831
832 pci_set_drvdata (pdev, dev);
833
834 dev->irq = pdev->irq;
835
836 sp = dev->priv;
837 sp->pdev = pdev;
838 sp->msg_enable = DEBUG;
839 sp->acpi_pwr = acpi_idle_state;
840 sp->tx_ring = tx_ring_space;
841 sp->tx_ring_dma = tx_ring_dma;
842 sp->lstats = (struct speedo_stats *)(sp->tx_ring + TX_RING_SIZE);
843 sp->lstats_dma = TX_RING_ELEM_DMA(sp, TX_RING_SIZE);
844 init_timer(&sp->timer); /* used in ioctl() */
845 spin_lock_init(&sp->lock);
846
847 sp->mii_if.full_duplex = option >= 0 && (option & 0x10) ? 1 : 0;
848 if (card_idx >= 0) {
849 if (full_duplex[card_idx] >= 0)
850 sp->mii_if.full_duplex = full_duplex[card_idx];
851 }
852 sp->default_port = option >= 0 ? (option & 0x0f) : 0;
853
854 sp->phy[0] = eeprom[6];
855 sp->phy[1] = eeprom[7];
856
857 sp->mii_if.phy_id = eeprom[6] & 0x1f;
858 sp->mii_if.phy_id_mask = 0x1f;
859 sp->mii_if.reg_num_mask = 0x1f;
860 sp->mii_if.dev = dev;
861 sp->mii_if.mdio_read = mdio_read;
862 sp->mii_if.mdio_write = mdio_write;
863
864 sp->rx_bug = (eeprom[3] & 0x03) == 3 ? 0 : 1;
865 if (((pdev->device > 0x1030 && (pdev->device < 0x103F)))
866 || (pdev->device == 0x2449) || (pdev->device == 0x2459)
867 || (pdev->device == 0x245D)) {
868 sp->chip_id = 1;
869 }
870
871 if (sp->rx_bug)
872 printk(KERN_INFO " Receiver lock-up workaround activated.\n");
873
874 /* The Speedo-specific entries in the device structure. */
875 dev->open = &speedo_open;
876 dev->hard_start_xmit = &speedo_start_xmit;
877 netif_set_tx_timeout(dev, &speedo_tx_timeout, TX_TIMEOUT);
878 dev->stop = &speedo_close;
879 dev->get_stats = &speedo_get_stats;
880 dev->set_multicast_list = &set_rx_mode;
881 dev->do_ioctl = &speedo_ioctl;
882
883 return 0;
884 }
speedo_found1(): The kernel needs to know that this is an Ethernet device and it can use this new PCI device to send/receive data over the network. For this, a device specific structure, net_device, is create and registered register_netdevice() with the kernel. net_device contains the device name, it’s MAC address, options like full-duplex, interrupt number (IRQ) & pointers to functions for executing all the device functions.
Every ethernet device found should have a unique name and on linux, ethernet devices are named eth0, eth1…eth100. dev_alloc_name() allocates a name for this device and sets it in net_device structure.
Every 802.3 device has an unique 48-bit MAC address assigned to it. This address is not hardcoded in 82559, but is stored by the board manufacturer in a non-volatile form, such as in the EEPROM or Flash EPROM outside 82559.
82559 expects the EEPROM format to be as shown below.
The 82559 automatically reads five words (0H, 1H, 2H, AH, and DH) from the EEPROM during bootup. The MAC address is extracted from 0H, 1H & 2H. The rest of the EEPROM map contains device options like type of connector, the device type, PHY device ID etc.
speedo_found1() then proceeds to reset the 82559 chip using the PORT command (writing a zero value to the SCBport, offset 8 in the CSR). The PORT commands is also used to self-test the 82559.
The kernel also needs to know what functions to call to open the device (speedo_open), transmit (speedo_start_xmit), close/stop (speedo_stop), get stats (speedo_get_stats), do IOCTL (speedo_ioctl). Notice that there is no receive function. This is because packets are received asynchronously. When a new packet is received 82559 interrupts the kernel and the interrupt service routine handles the received packet (more on this later). At this point the timer routines are also initialized.
This completes the initialization of 82559. The device is now ready to receive & transmitt ethernet frames.
Assigning an IP address to the device
After initializing the device, the device should be opened so that it is accessible from the IP layer. The device is accessible from the outside world when an IP address is assigned to it. One way to assign an IP address to an interface is throught the ifconfig program available from the net-utils.
The syntax to enable the device is:
ifconfig eth0 up
When asked to bring up the eth0 interface, ifconfig creates a generic raw TCP socket to the afinet address family and issues a SIOCSIFFLAG ioctl to this raw socket. The flags set on the interface are the IFF_UP & IFF_RUNNING.
/* ifconfig.c */
if (!strcmp(*spp, "up")) { goterr |= set_flag(ifr.ifr_name, (IFF_UP | IFF_RUNNING)); spp++; continue; }
/* Set a certain interface flag. */ static int set_flag(char *ifname, short flag) { struct ifreq ifr;
safe_strncpy(ifr.ifr_name, ifname, IFNAMSIZ);
if (ioctl(skfd, SIOCGIFFLAGS, &ifr) < 0) {
fprintf(stderr, _("%s: unknown interface: %s\n"),
ifname, strerror(errno));
return (-1);
}
safe_strncpy(ifr.ifr_name, ifname, IFNAMSIZ);
ifr.ifr_flags |= flag;
if (ioctl(skfd, SIOCSIFFLAGS, &ifr) < 0) {
perror("SIOCSIFFLAGS");
return -1;
}
return (0);
} `
The userspace ioctl() system call is transformed to the inet_ioctl() defined in af_inet.c. For ifconfig (or any interface-type ioctls) inet_ioctl() calls devinet_ioctl() function.
460 int devinet_ioctl(unsigned int cmd, void *arg)
461 {
462 struct ifreq ifr;
463 struct sockaddr_in sin_orig;
464 struct sockaddr_in *sin = (struct sockaddr_in *)&ifr.ifr_addr;
465 struct in_device *in_dev;
466 struct in_ifaddr **ifap = NULL;
467 struct in_ifaddr *ifa = NULL;
468 struct net_device *dev;
469 char *colon;
470 int ret = 0;
471 int tryaddrmatch = 0;
472
473 /*
474 * Fetch the caller's info block into kernel space
475 */
476
477 if (copy_from_user(&ifr, arg, sizeof(struct ifreq)))
478 return -EFAULT;
479 ifr.ifr_name[IFNAMSIZ-1] = 0;
480
481 /* save original address for comparison */
482 memcpy(&sin_orig, sin, sizeof(*sin));
483
484 colon = strchr(ifr.ifr_name, ':');
485 if (colon)
486 *colon = 0;
487
488 #ifdef CONFIG_KMOD
489 dev_load(ifr.ifr_name);
490 #endif
491
492 switch(cmd) {
493 case SIOCGIFADDR: /* Get interface address */
494 case SIOCGIFBRDADDR: /* Get the broadcast address */
495 case SIOCGIFDSTADDR: /* Get the destination address */
496 case SIOCGIFNETMASK: /* Get the netmask for the interface */
497 /* Note that these ioctls will not sleep,
498 so that we do not impose a lock.
499 One day we will be forced to put shlock here (I mean SMP)
500 */
501 tryaddrmatch = (sin_orig.sin_family == AF_INET);
502 memset(sin, 0, sizeof(*sin));
503 sin->sin_family = AF_INET;
504 break;
505
506 case SIOCSIFFLAGS:
507 if (!capable(CAP_NET_ADMIN))
508 return -EACCES;
509 break;
510 case SIOCSIFADDR: /* Set interface address (and family) */
511 case SIOCSIFBRDADDR: /* Set the broadcast address */
512 case SIOCSIFDSTADDR: /* Set the destination address */
513 case SIOCSIFNETMASK: /* Set the netmask for the interface */
514 if (!capable(CAP_NET_ADMIN))
515 return -EACCES;
516 if (sin->sin_family != AF_INET)
517 return -EINVAL;
518 break;
519 default:
520 return -EINVAL;
521 }
522
523 dev_probe_lock();
524 rtnl_lock();
525
526 if ((dev = __dev_get_by_name(ifr.ifr_name)) == NULL) {
527 ret = -ENODEV;
528 goto done;
529 }
530
531 if (colon)
532 *colon = ':';
533
534 if ((in_dev=__in_dev_get(dev)) != NULL) {
535 if (tryaddrmatch) {
536 /* Matthias Andree */
537 /* compare label and address (4.4BSD style) */
538 /* note: we only do this for a limited set of ioctls
539 and only if the original address family was AF_INET.
540 This is checked above. */
541 for (ifap=&in_dev->ifa_list; (ifa=*ifap) != NULL; ifap=&ifa->ifa_next) {
542 if ((strcmp(ifr.ifr_name, ifa->ifa_label) == 0)
543 && (sin_orig.sin_addr.s_addr == ifa->ifa_address)) {
544 break; /* found */
545 }
546 }
547 }
548 /* we didn't get a match, maybe the application is
549 4.3BSD-style and passed in junk so we fall back to
550 comparing just the label */
551 if (ifa == NULL) {
552 for (ifap=&in_dev->ifa_list; (ifa=*ifap) != NULL; ifap=&ifa->ifa_next)
553 if (strcmp(ifr.ifr_name, ifa->ifa_label) == 0)
554 break;
555 }
556 }
557
558 if (ifa == NULL && cmd != SIOCSIFADDR && cmd != SIOCSIFFLAGS) {
559 ret = -EADDRNOTAVAIL;
560 goto done;
561 }
562
563 switch(cmd) {
564 case SIOCGIFADDR: /* Get interface address */
565 sin->sin_addr.s_addr = ifa->ifa_local;
566 goto rarok;
567
568 case SIOCGIFBRDADDR: /* Get the broadcast address */
569 sin->sin_addr.s_addr = ifa->ifa_broadcast;
570 goto rarok;
571
572 case SIOCGIFDSTADDR: /* Get the destination address */
573 sin->sin_addr.s_addr = ifa->ifa_address;
574 goto rarok;
575
576 case SIOCGIFNETMASK: /* Get the netmask for the interface */
577 sin->sin_addr.s_addr = ifa->ifa_mask;
578 goto rarok;
579
580 case SIOCSIFFLAGS:
581 if (colon) {
582 if (ifa == NULL) {
583 ret = -EADDRNOTAVAIL;
584 break;
585 }
586 if (!(ifr.ifr_flags&IFF_UP))
587 inet_del_ifa(in_dev, ifap, 1);
588 break;
589 }
590 ret = dev_change_flags(dev, ifr.ifr_flags);
591 break;
592
593 case SIOCSIFADDR: /* Set interface address (and family) */
594 if (inet_abc_len(sin->sin_addr.s_addr) < 0) {
595 ret = -EINVAL;
596 break;
597 }
598
599 if (!ifa) {
600 if ((ifa = inet_alloc_ifa()) == NULL) {
601 ret = -ENOBUFS;
602 break;
603 }
604 if (colon)
605 memcpy(ifa->ifa_label, ifr.ifr_name, IFNAMSIZ);
606 else
607 memcpy(ifa->ifa_label, dev->name, IFNAMSIZ);
608 } else {
609 ret = 0;
610 if (ifa->ifa_local == sin->sin_addr.s_addr)
611 break;
612 inet_del_ifa(in_dev, ifap, 0);
613 ifa->ifa_broadcast = 0;
614 ifa->ifa_anycast = 0;
615 }
616
617 ifa->ifa_address =
618 ifa->ifa_local = sin->sin_addr.s_addr;
619
620 if (!(dev->flags&IFF_POINTOPOINT)) {
621 ifa->ifa_prefixlen = inet_abc_len(ifa->ifa_address);
622 ifa->ifa_mask = inet_make_mask(ifa->ifa_prefixlen);
623 if ((dev->flags&IFF_BROADCAST) && ifa->ifa_prefixlen < 31)
624 ifa->ifa_broadcast = ifa->ifa_address|~ifa->ifa_mask;
625 } else {
626 ifa->ifa_prefixlen = 32;
627 ifa->ifa_mask = inet_make_mask(32);
628 }
629 ret = inet_set_ifa(dev, ifa);
630 break;
631
632 case SIOCSIFBRDADDR: /* Set the broadcast address */
633 if (ifa->ifa_broadcast != sin->sin_addr.s_addr) {
634 inet_del_ifa(in_dev, ifap, 0);
635 ifa->ifa_broadcast = sin->sin_addr.s_addr;
636 inet_insert_ifa(ifa);
637 }
638 break;
639
640 case SIOCSIFDSTADDR: /* Set the destination address */
641 if (ifa->ifa_address != sin->sin_addr.s_addr) {
642 if (inet_abc_len(sin->sin_addr.s_addr) < 0) {
643 ret = -EINVAL;
644 break;
645 }
646 inet_del_ifa(in_dev, ifap, 0);
647 ifa->ifa_address = sin->sin_addr.s_addr;
648 inet_insert_ifa(ifa);
649 }
650 break;
651
652 case SIOCSIFNETMASK: /* Set the netmask for the interface */
653
654 /*
655 * The mask we set must be legal.
656 */
657 if (bad_mask(sin->sin_addr.s_addr, 0)) {
658 ret = -EINVAL;
659 break;
660 }
661
662 if (ifa->ifa_mask != sin->sin_addr.s_addr) {
663 inet_del_ifa(in_dev, ifap, 0);
664 ifa->ifa_mask = sin->sin_addr.s_addr;
665 ifa->ifa_prefixlen =
666 inet_mask_len(ifa->ifa_mask);
667
668 /* See if current broadcast address matches
669 * with current netmask, then recalculate
670 * the broadcast address. Otherwise it's a
671 * funny address, so don't touch it since
672 * the user seems to know what (s)he's doing...
673 */
674 if ((dev->flags & IFF_BROADCAST) &&
675 (ifa->ifa_prefixlen < 31) &&
676 (ifa->ifa_broadcast ==
677 (ifa->ifa_local|~ifa->ifa_mask))) {
678 ifa->ifa_broadcast =
679 (ifa->ifa_local |
680 ~sin->sin_addr.s_addr);
681 }
682 inet_insert_ifa(ifa);
683 }
684 break;
685 }
686 done:
687 rtnl_unlock();
688 dev_probe_unlock();
689 return ret;
690
691 rarok:
692 rtnl_unlock();
693 dev_probe_unlock();
694 if (copy_to_user(arg, &ifr, sizeof(struct ifreq)))
695 return -EFAULT;
696 return 0;
697 }
devinet_ioctl() fetches the user space defined ifreq structure containing the name of our interface and the IP address to the kernel space. Based on the name of the interface (eth0, for e.g.), the device structure, net_device (remember we created this in speedo_found1() above?), is looked up [*__dev_get_by_name()* ]. The IP is set to this device by *inet_set_ifa()*.
Shared Memory Communication Architecture
After initialization, 82559 is ready for its normal operation. As a Fast Ethernet Controller, its normal operation is to transmit and receive data packets. As a PCI bus master device, 82559 works independently, without CPU intervention. The CPU provides the 82559 with action commands and pointers to the data buffers that reside in host main memory. The 82559 independently manages these structures and initiates burst memory cycles to transfer data to and from main memory.
The CPU controls and examines 82559 via its control and status structures. Some of these control and status structures reside within the 82559 and some reside in system memory. For transfer of data to/from the CPU, the 82559 establishes a shared memory communication with the host CPU. This shared memory is divided into three parts:
- The Control/Status Registers (CSR)
- The Command Block List (CBL) or just Command List (CL)
- The Receive Frame Area (RFA).
The CSR resides on-chip and can be accessed by either I/O or memory cycles (after the PCI BIOS has mapped this region to a region accessible by the CPU. See the section PCI Kernel Initialization), while the CBL and RFA reside in system (host) memory.
Command Block List (CBL) is a linked list of commands to be executed by 82559. Receive Frame Area (RFA) is a linked list of data structures that holds the received packets (frames).
Controlling 82559 through CSR
The 82559 has seven Control/Status registers which make up the CSR space.
The first 8 bytes of the CSR is called the System Control Block (SCB). The SCB serves as a central communication point for exchanging control and status information between the host CPU and the 82559.
The CPU instructs the 82559 to Activate, Suspend, Resume or Idle the Command Unit (CU) or Receive Unit (RU) by placing the appropriate control command in the CU or RU control field of SCB. Activating the CU causes the 82559 to begin transmitting packets. When transmission is completed, the 82559 updates the SCB with the CU status then interrupts the CPU, if configured to do so. Activating the RU causes the 82559 to go into the READY state for frame reception. When a frame is received the RU updates the SCB with the RU status and interrupts the CPU.
Command Block List (CBL) and Transmitted Frame
Transmit or configure commands issued by CPU are wrapped inside what are called Command Blocks (CB). These command blocks are chained together to form the CBL.
Action commands are categorized into two types:
- Non-Tx commands: This category includes commands such as NOP, Configure, IA Setup, Multicast Setup, Dump and Diagnose.
- Tx command: This command causes the 82559 to transmit a frame. A transmit command block contains (in the parameter field) the destination address, length of the transmitted frame and a pointer to buffer area in memory containing the data portion of the frame. The data field is contained in a memory data structure consisting of a buffer descriptor (BD) and a data buffer, or a linked list of buffer descriptors and buffers (as shown in figure below).
When eepro100 is ready to transmit a packet, it must create this Tx command block and send it to 82559. This Tx Command block is a structure called TxFD (Transmit Frame Descriptor).
406 #define CONFIG_DATA_SIZE 22
407 struct TxFD { /* Transmit frame descriptor set. */
408 s32 status;
409 u32 link; /* void * */
410 u32 tx_desc_addr; /* Always points to the tx_buf_addr element. */
411 s32 count; /* # of TBD (=1), Tx start thresh., etc. */
412 /* This constitutes two "TBD" entries -- we only use one. */
413 #define TX_DESCR_BUF_OFFSET 16
414 u32 tx_buf_addr0; /* void *, frame to be transmitted. */
415 s32 tx_buf_size0; /* Length of Tx frame. */
416 u32 tx_buf_addr1; /* void *, frame to be transmitted. */
417 s32 tx_buf_size1; /* Length of Tx frame. */
418 /* the structure must have space for at least CONFIG_DATA_SIZE starting
419 * from tx_desc_addr field */
420 };
This TxFD can hold one TxCB and two Tx Buffer Descriptors (TxBD). During eepro100 initialization (speedo_found1()), a fixed number of these TxFD’s are created and linked together into a ring (tx_ring_space). When new data is available for transmission, one of the TxFD is fetched from the ring and sent to 82559 for transmission.
The status field of TxFD is a bit array and can contain any of:
342 /* Commands that can be put in a command list entry. */
343 enum commands {
344 CmdNOp = 0, CmdIASetup = 0x10000, CmdConfigure = 0x20000,
345 CmdMulticastList = 0x30000, CmdTx = 0x40000, CmdTDR = 0x50000,
346 CmdDump = 0x60000, CmdDiagnose = 0x70000,
347 CmdSuspend = 0x40000000, /* Suspend after completion. */
348 CmdIntr = 0x20000000, /* Interrupt after completion. */
349 CmdTxFlex = 0x00080000, /* Use "Flexible mode" for CmdTx command. */
350 };
Receive Frame Area
To reduce CPU overhead, the 82559 is designed to receive frames without CPU supervision. The host CPU first sets aside an adequate receive buffer space and then enables the 82559 Receive Unit (This is done in speedo_init_rx_ring when the device is opened: speedo_open). Once enabled, the RU watches for arriving frames and automatically stores them in the Receive Frame Area (RFA).
The RFA contains Receive Frame Descriptors, Receive Buffer Descriptors, and Receive Buffers (see figure below).
The individual Receive Frame Descriptors make up a Receive Descriptor List (RDL) used by the 82559 to store the destination and source addresses, the length field, and the status of each frame received.
eepro100 representation of the Receive Frame Descriptor (RxFD):
390 /* The Speedo3 Rx and Tx buffer descriptors. */
391 struct RxFD { /* Receive frame descriptor. */
392 volatile s32 status;
393 u32 link; /* struct RxFD * */
394 u32 rx_buf_addr; /* void * */
395 u32 count;
396 } RxFD_ALIGNMENT;
397
398 /* Selected elements of the Tx/RxFD.status word. */
399 enum RxFD_bits {
400 RxComplete=0x8000, RxOK=0x2000,
401 RxErrCRC=0x0800, RxErrAlign=0x0400, RxErrTooBig=0x0200, RxErrSymbol=0x0010,
402 RxEth2Type=0x0020, RxNoMatch=0x0004, RxNoIAMatch=0x0002,
403 TxUnderrun=0x1000, StatusComplete=0x8000,
404 };
Data Transmission
An application calls write(socket, data, length) system call to write to an open socket. In the kernel, inet_sendmsg() is executed with a pointer to the sock structure. inet_sendmsg() calls the send operation of the corresponding transport protocol which for TCP is tcp_sendmsg(). tcp_sendmsg() copies the data to be transmitted from the user space to the socket and starts the transmit process by calling tcp_send_skb() and subsequently tcp_transmitt_skb(). tcp_transmitt_skb() adds the TCP Header to the packet, calculate the TCP checksum and call the ip_queue_xmit(). Determining the ip route and construction of the IP header happens in ip_queue_xmit(). Finally the MAC address is copied to the packet and dev_queue_xmit() is called to send the packet to the ethernet device.
dev_queue_xmit() points to a driver specific function. In case of eepro100, this function is speedo_start_xmit() (remember we set this in speedo_found1()?).
1435 static int
1436 speedo_start_xmit(struct sk_buff *skb, struct net_device *dev)
1437 {
1438 struct speedo_private *sp = (struct speedo_private *)dev->priv;
1439 long ioaddr = dev->base_addr;
1440 int entry;
1441
1442 /* Prevent interrupts from changing the Tx ring from underneath us. */
1443 unsigned long flags;
1444
1445 spin_lock_irqsave(&sp->lock, flags);
1446
1447 /* Check if there are enough space. */
1448 if ((int)(sp->cur_tx - sp->dirty_tx) >= TX_QUEUE_LIMIT) {
1449 printk(KERN_ERR "%s: incorrect tbusy state, fixed.\n", dev->name);
1450 netif_stop_queue(dev);
1451 sp->tx_full = 1;
1452 spin_unlock_irqrestore(&sp->lock, flags);
1453 return 1;
1454 }
1455
1456 /* Calculate the Tx descriptor entry. */
1457 entry = sp->cur_tx++ % TX_RING_SIZE;
1458
1459 sp->tx_skbuff[entry] = skb;
1460 sp->tx_ring[entry].status =
1461 cpu_to_le32(CmdSuspend | CmdTx | CmdTxFlex);
1462 if (!(entry & ((TX_RING_SIZE>>2)-1)))
1463 sp->tx_ring[entry].status |= cpu_to_le32(CmdIntr);
1464 sp->tx_ring[entry].link =
1465 cpu_to_le32(TX_RING_ELEM_DMA(sp, sp->cur_tx % TX_RING_SIZE));
1466 sp->tx_ring[entry].tx_desc_addr =
1467 cpu_to_le32(TX_RING_ELEM_DMA(sp, entry) + TX_DESCR_BUF_OFFSET);
1468 /* The data region is always in one buffer descriptor. */
1469 sp->tx_ring[entry].count = cpu_to_le32(sp->tx_threshold);
1470 sp->tx_ring[entry].tx_buf_addr0 =
1471 cpu_to_le32(pci_map_single(sp->pdev, skb->data,
1472 skb->len, PCI_DMA_TODEVICE));
1473 sp->tx_ring[entry].tx_buf_size0 = cpu_to_le32(skb->len);
1474
1475 /* workaround for hardware bug on 10 mbit half duplex */
1476
1477 if ((sp->partner == 0) && (sp->chip_id == 1)) {
1478 wait_for_cmd_done(dev);
1479 outb(0 , ioaddr + SCBCmd);
1480 udelay(1);
1481 }
1482
1483 /* Trigger the command unit resume. */
1484 wait_for_cmd_done(dev);
1485 clear_suspend(sp->last_cmd);
1486 /* We want the time window between clearing suspend flag on the previous
1487 command and resuming CU to be as small as possible.
1488 Interrupts in between are very undesired. --SAW */
1489 outb(CUResume, ioaddr + SCBCmd);
1490 sp->last_cmd = (struct descriptor *)&sp->tx_ring[entry];
1491
1492 /* Leave room for set_rx_mode(). If there is no more space than reserved
1493 for multicast filter mark the ring as full. */
1494 if ((int)(sp->cur_tx - sp->dirty_tx) >= TX_QUEUE_LIMIT) {
1495 netif_stop_queue(dev);
1496 sp->tx_full = 1;
1497 }
1498
1499 spin_unlock_irqrestore(&sp->lock, flags);
1500
1501 dev->trans_start = jiffies;
1502
1503 return 0;
1504 }
speedo_start_xmit() inserts any data received from the kernel (skb) into the Tx ring. If there are no open slots in the Tx ring, netif_stop_queue() is called to request the kernel to stop sending more packets from the upper layers and flag the Tx ring as full. The new skb (data) to be transmitted is inserted as the data portion of a TxFD at tx_buf_addr0 (See TxFD above).
The TxFD status of this entry is set to CmdSuspend (suspend after completion), CmdTx and CmdFlex (flexible transmission mode). The last command inserted into the Tx ring has the CmdSuspend bit set so that the CU is suspended immediately after the last command is executed. This way we prevent any erroneous data from being transmitted. We have to clear the CmdSuspend from the previous command already in the Tx ring before doing this. If the Tx ring is more than half full, we also set CmdIntr (interrupt after completion). The causes the chip to generates an interrupt after executing this command. When an interrupt is received after transmit completes, the interrupt handler calls speedo_tx_buffer_gc() to clean up completed and erroneous skb from the Tx ring.
Finally, we activate the CU to transmit this new packet by issuing CmdResume to SCB.
Generating the Ethernet Frame
The final ethernet frame sent over the wire is:
82559 automatically generates the preamble (alternating 1s and 0s) and start frame delimiter, fetches the destination address and length field from the Transmit command, inserts its unique MAC address (that it fetched from the external Flash/EEPROM) as the source address, fetches the data field specified by the Transmit command, and computes and appends the CRC to the end of the frame.
This final frame is then handed over to the PHY layer for transmission over the wire.
Bits to Waves
82559 has an internal 82555 Physical Layer Interface (PHY). It is responsible for connecting the 82559 to the actual physical wire over which the data will be carried. PHY converts the incoming digital data to analog signals during transmission and analog signals to digital data during reception.
Signal Transmission
To achieve a high transfer rate (upto 125Mbps), two tasks must be performed to the data before it is transmitted over the wire:
- scrambling/descrambling
- encoding/decoding
Scrambling/Descrambling
All data transmitted and received over wire are synchronized with a clock. To keep the receiver in sync with the transmitter, the clock signals have to be embedded in the signal transmitted over the wire itself. The robustness of this digitally transmitted synchronization signal often depends on the statistical nature of the data being transmitted. For example, long strings of 0’s and 1’s can cause loss of the synchronization since the receiver clock is derived from the received data. Therefore, data must contain adequate transitions to assure that the timing recovery circuit at the receiver will stay in synchronization. Scrambling (randomizing) the data over a period of time spreads these patterns.
Encoding/Decoding
There are different ways to represent the digital 1 & 0 over the wire. The most widely used is Non-Return to Zero (NRZI) format. NRZI, is a two level unipolar code (0 and V) representing a “one” by a transition between two levels and a “zero” is represented by no transition as shown in the figure below. Another format is MLT-3. MLT-3 is a three level eenting a “one” by a transition between two levels and “zero” as no transition as shown in figure below. MLT-3 has the advantage that the maximum fundamental frequency of MLT-3 is one-half that of NRZI. With the MLT-3 coding scheme, 90% of the spectral energy is below 40MHz versus 70MHz for NRZI. Thus we can achieves the same data rate as NRZI, but do not require a wideband transmission medium. The work of the encoder/decoder is to convert between NRZI and MLT-3.
Finally, the MLT-3 encoded data is transmitted over the wire. It is important to isolate the the PHY from the CAT-5 Ethernet cable for load balancing and also feedback. This is done by using specialized Ethernet magnetics with each side of the transformer referenced to the appropriate ground.
{% cimg http://interviewquestions.pupilgarage.com/images/EC%20Images/EC_Fig03.gif PHY to Magnetics interface %}
Signal Reception
Once the PHY detects signals on the receive side, it decodes and descrambles it to reconstruct the data transmitted by the receiver.
Receiving Frames
To reduce CPU overhead, the 82559 is designed to receive frames without CPU supervision. The eepro100 had already setup the address of the receive buffer ring in the SCB as part of initialization. Once the 82559 receive unit (RU) is enabled, the RU watches for arriving frames and automatically stores them in the Rx ring / Receive Frame Area (RFA). The RFA contains Receive Frame Descriptors, Receive Buffer Descriptors, and Data Buffers (see Figure 2). The individual Receive Frame Descriptors make up a Receive Descriptor List (RDL) used by the 82559 to store the destination and source addresses, the length field, and the status of each frame received.
82559 checks each passing frame for an address match. The 82559 will recognize its own unique address, one or more multicast addresses, or the broadcast address. If a match is found, 82559 stores the destination address, source addresses and the length field in the next available Receive Frame Descriptor (RFD). It then begins filling the next available Data Buffer on the Receive Buffer Descriptor (RBD). As one Data Buffer is filled, the 82559 automatically fetches the next Data Buffer & RBD until the entire frame is received.
Once the entire frame is received without error, a frame received interrupt status bit is posted in the SCB and an interrupt is sent to the CPU.
The interrupt handler (speedo_interrupt()) checks if the receive interrupt bit is set in SCB and calls speedo_rx() to handle the received packet.
1756 static int
1757 speedo_rx(struct net_device *dev)
1758 {
1759 struct speedo_private *sp = (struct speedo_private *)dev->priv;
1760 int entry = sp->cur_rx % RX_RING_SIZE;
1761 int rx_work_limit = sp->dirty_rx + RX_RING_SIZE - sp->cur_rx;
1762 int alloc_ok = 1;
1763 int npkts = 0;
1764
1765 if (netif_msg_intr(sp))
1766 printk(KERN_DEBUG " In speedo_rx().\n");
1767 /* If we own the next entry, it's a new packet. Send it up. */
1768 while (sp->rx_ringp[entry] != NULL) {
1769 int status;
1770 int pkt_len;
1771
1772 pci_dma_sync_single(sp->pdev, sp->rx_ring_dma[entry],
1773 sizeof(struct RxFD), PCI_DMA_FROMDEVICE);
1774 status = le32_to_cpu(sp->rx_ringp[entry]->status);
1775 pkt_len = le32_to_cpu(sp->rx_ringp[entry]->count) & 0x3fff;
1776
1777 if (!(status & RxComplete))
1778 break;
1779
1780 if (--rx_work_limit < 0)
1781 break;
1782
1783 /* Check for a rare out-of-memory case: the current buffer is
1784 the last buffer allocated in the RX ring. --SAW */
1785 if (sp->last_rxf == sp->rx_ringp[entry]) {
1786 /* Postpone the packet. It'll be reaped at an interrupt when this
1787 packet is no longer the last packet in the ring. */
1788 if (netif_msg_rx_err(sp))
1789 printk(KERN_DEBUG "%s: RX packet postponed!\n",
1790 dev->name);
1791 sp->rx_ring_state |= RrPostponed;
1792 break;
1793 }
1794
1795 if (netif_msg_rx_status(sp))
1796 printk(KERN_DEBUG " speedo_rx() status %8.8x len %d.\n", status,
1797 pkt_len);
1798 if ((status & (RxErrTooBig|RxOK|0x0f90)) != RxOK) {
1799 if (status & RxErrTooBig)
1800 printk(KERN_ERR "%s: Ethernet frame overran the Rx buffer, "
1801 "status %8.8x!\n", dev->name, status);
1802 else if (! (status & RxOK)) {
1803 /* There was a fatal error. This *should* be impossible. */
1804 sp->stats.rx_errors++;
1805 printk(KERN_ERR "%s: Anomalous event in speedo_rx(), "
1806 "status %8.8x.\n",
1807 dev->name, status);
1808 }
1809 } else {
1810 struct sk_buff *skb;
1811
1812 /* Check if the packet is long enough to just accept without
1813 copying to a properly sized skbuff. */
1814 if (pkt_len < rx_copybreak
1815 && (skb = dev_alloc_skb(pkt_len + 2)) != 0) {
1816 skb->dev = dev;
1817 skb_reserve(skb, 2); /* Align IP on 16 byte boundaries */
1818 /* 'skb_put()' points to the start of sk_buff data area. */
1819 pci_dma_sync_single(sp->pdev, sp->rx_ring_dma[entry],
1820 sizeof(struct RxFD) + pkt_len, PCI_DMA_FROMDEVICE);
1821
1822 #if 1 || USE_IP_CSUM
1823 /* Packet is in one chunk -- we can copy + cksum. */
1824 eth_copy_and_sum(skb, sp->rx_skbuff[entry]->tail, pkt_len, 0);
1825 skb_put(skb, pkt_len);
1826 #else
1827 memcpy(skb_put(skb, pkt_len), sp->rx_skbuff[entry]->tail,
1828 pkt_len);
1829 #endif
1830 npkts++;
1831 } else {
1832 /* Pass up the already-filled skbuff. */
1833 skb = sp->rx_skbuff[entry];
1834 if (skb == NULL) {
1835 printk(KERN_ERR "%s: Inconsistent Rx descriptor chain.\n",
1836 dev->name);
1837 break;
1838 }
1839 sp->rx_skbuff[entry] = NULL;
1840 skb_put(skb, pkt_len);
1841 npkts++;
1842 sp->rx_ringp[entry] = NULL;
1843 pci_unmap_single(sp->pdev, sp->rx_ring_dma[entry],
1844 PKT_BUF_SZ + sizeof(struct RxFD), PCI_DMA_FROMDEVICE);
1845 }
1846 skb->protocol = eth_type_trans(skb, dev);
1847 netif_rx(skb);
1848 sp->stats.rx_packets++;
1849 sp->stats.rx_bytes += pkt_len;
1850 }
1851 entry = (++sp->cur_rx) % RX_RING_SIZE;
1852 sp->rx_ring_state &= ~RrPostponed;
1853 /* Refill the recently taken buffers.
1854 Do it one-by-one to handle traffic bursts better. */
1855 if (alloc_ok && speedo_refill_rx_buf(dev, 0) == -1)
1856 alloc_ok = 0;
1857 }
1858
1859 /* Try hard to refill the recently taken buffers. */
1860 speedo_refill_rx_buffers(dev, 1);
1861
1862 if (npkts)
1863 sp->last_rx_time = jiffies;
1864
1865 return 0;
1866 }
The receive is a very simple process as most of the work is handled by the 82559 chip. 82559 while storing the packet on the Rx ring has already determined the size of the packet, it stores this along with other relevent information like the receive status in the Rx buffer. speedo_rx() checks the status to make sure there were no errors in receiving the frame.
The driver receives packets into full-sized buffers – 1560 bytes. When a packet comes in, the driver needs to make a decision. Does it use the whole 1560 bytes for this packet, or does it allocate a smaller buffer on-the-fly and copy the data into it? If the size of the frame received is smaller than the rx_copybreak, then a new buffer is allocated and the data is copied into it. If the packet is larger than rx_copybreak, we remove the received skbuff (leaving a hole in the Rx ring) and pass this buffer to the higher applications. We later call speedo_refill_rx_buffers() to refill the hole in the Rx ring.
The type of protocol of this packet is determined by calling eth_type_trans(). The packet is then queued for the upper layers to process using the netif_rx(). Finally we refill the buffer we just took out of the Rx ring.