CAN bus

From dankwiki

The Controller Area Networks bus standards describe a two-wire, serial, multi-master, synchronized (but clockless), broadcast-only system designed for vehicles. CAN is one of the mandated transports for ODB-II on-board diagnostics, and is required in all US vehicles since 2008 (European vehicles employ EOBD, required since 2004). CAN distance decreases with bit rate, but at low rates can run to the kilometers. It offers no mechanisms of security.

Standards

  • CAN-1 (1986): Original Bosch protocol
  • CAN-2.0 (1991): Bosch update, later standardized by ISO 11519 (1993)
    • CAN-2.0A (ISO 11898-3, 2006): 11-bit identifiers, up to 125Kbit/s, up to 32 nodes (CAN-Lo, "Basic CAN", "Reliable CAN"). Star/linear bus.
    • CAN-2.0B (ISO 11898-2, 2003): 29-bit identifiers, up to 1Mbit/s, up to 110 nodes (CAN-High, "Full CAN"). Linear bus, 120Ω at each end.
      • CAN-FD (ISO 11898-2, 2015): Extension to CAN-2.0B for up to 64B messages + better CRC + 12Mbit/s
  • ISO 11898-1: Data link layer common to CAN-2.0A and B
  • ISO 15765-2 (2016): "Road vehicles: Diagnostic communication over CAN" ISO-TP L3/L4 for larger (up to 4095B) packets, 15Mbit/s
  • ISO 11783 (2017): "ISOBUS" at 250Kbit/s, 4-wire terminating plug-and-play bias circuits (power, ground, CAN), 30 node max

The mechanical connector is not mandated by any CAN standard, and proprietary ones are regularly used. External interfaces will often provide e.g. DE-9 (female on the bus, male on the ECU).

The Society of Automotive Engineers defines protocols atop CAN:

  • SAE J1962: OBD-II diagnostic connector
    • 16-pin (2x8) female connectors
    • J1962A is 12V, J1962B (interrupted middle groove) is 24V
  • SAE J1939: in-vehicle network for large vehicles
  • SAE J2284: 500KBit/s in-vehicle network for cars
  • SAE J1979: Standard diagnostics objects making up the base of OBD-II
  • KWP7000: ISO 14230 application layer for OBD

CiA is the working group CAN in Automation:

  • CiA 301: CANopen, standardized communication objects (COBs) for EN 50323-4
  • UDS: ISO 14229-1, Unified Diagnostic Services

Participants

The various nodes of a CAN are known as ECUs (Electronic Control Units). Each can be assumed to include a microprocessor, a (possibly integrated) ISO 11898-1 CAN controller, and a ISO 11898-2/3 transceiver ("medium access unit"). Each ECU must have its own ID unique among the nodes. Depending on the flavor of CAN, IDs are either 11 or 29 bits. Choice of transceivers to a large degree define the network. Some examples:

  • PCA82C250 -- 1Mbit, unlikely to handle most ISO 11898-1 cable error modes
  • PCA82C252/TJA1054A -- 125Kbit, highly tolerant of 11898 cable fault modes, very low EM emissions, single-wire capability (33.3Kbit/s)
  • MC33897/AUS5790 -- 33.33 or 83.33Kbit/s single-wire CAN

Nodes, especially e.g. sensors, typically send their state in Data Frames on a regular period.

CAN IDs

Lower IDs have priority over higher IDs, and form the 11-bit "Arbitration Field" in a CAN frame (there are 2 such fields in an Extended Frame, one of 11 and one of 18 bits, separated by two bits). While transmitting a message, each ECU must also listen. If it sees a 0 while sending a 1, it must cease to transmit until the current message ends. Logical 0 is the "dominant" signal:

  • If multiple ECUs are transmitting at the same cycle, all must transmit recessive, or everyone will see dominant
  • Transition to dominant is slower than transition from dominant
  • Synchronization begins on the first recessive to dominant transition after an "interframe space" (3 consecutive recessive bits)
    • This dominant signal is considered the "start bit" of frame
  • Resynchronization occurs on each subsequent recessive to dominant transition

Signaling

A CAN bit, signaled using NRZ coding, is made up of at least four quanta:

  • Synchronization segment (it is here that the bit edge takes place)
  • Propagation segment compensating for delay in bus lines
  • Phase Segment I
  • Phase Segment II

The Synchronization segment's transition is a "hard synchronization" -- it defines the bit starttime. Resynchronization takes place during the Phase Segments, which are lenghtened or shortened based on a phase error relative to the synchronization segment (up to the configured Synchronization Jump Width). Sampling typically takes place between the two Phase Segments, with "triple-sampling" interfaces taking the majority-decode of three samples across the bit.

The total length is referred to as the Nominal Bit Time.

Frames

A frame begins with a transition from recessive to dominant, the "start bit". It is followed by:

  • 11-bit base identifier, common to CAN-2.0A and B.
    • All transmitting ECUs must be listening during this field, and MUST cease transmitting if preempted by a dominant signal
  • RTR bit. Remote Transmission Requests (recessive) expect a response.
    • Much more common are Data Frames (dominant), which carry payloads and expect no response
    • Note that Data Frames will win arbitration against Remote Frames having the same ID
    • If the subsequent Identifier Extension Bit is recessive, indicating 29-bit identifiers, this bit MUST be recessive, and the true RTR follows the extended identifier
    • In this case, this bit is known as the "Substitute Remote Request" (SRR) bit
  • IDE bit. Identifier Extension. Recessive if and only if an 18-bit extended identifier is to be provided, in which case:
    • 18-bit extended identifier, combining to make a 29-bit arbitrating ID
    • Real RTR bit
    • A dominant reserved bit (must accept but not transmit recessive)

After the identifier (whether 11-bit or 29-bit) has been transmitted, it is only possible for one ECU to be transmitting in a compliant network (i.e., where all ECUs have distinct IDs). After this point, the two frame formats are unified:

  • A dominant reserved bit
  • 4-bit DLC. Data Length Code, specifying a number of data bytes 0--8.
    • For Data Frames, this is the length of the subsequent Data Field
    • For Remote Frames, this is the length of the expected response's Data Field
    • Larger values (9--15) can appear, but are not used in ISO CAN.
  • 0--8 byte Data Field (Data Frames only)
  • 15-bit CRC
  • A recessive reserved bit (CRC delimiter)
  • 1-bit ACK slot. recessive for transmitter, dominant for receivers (who become transmitters for one clock)
  • A recessive reserved bit (ACK delimiter)
  • 7 recessive reserved bits (Frame delimiter)

Three error frames are defined:

  • Active error: 6 dominant bits followed by 8 recessive bits
    • Will typically destroy any ongoing message
    • Provokes secondary error frames when sent due to a local problem
  • Passive error: 6 recessive bits
  • Overload frame: 6 dominant bits followed by 8 recessive bits, during the interframe space

Error States

Each ECU maintains an RX Error counter and TX Error Counter (REC/TEC), initialized to 0. Successful operations reduce the appropriate counter by 1, though never below 0. Errors increase the counters (usually TEC by 8, REC by 1). Depending on the values of these counters, each ECU is in one of three error states:

  • Error-Active: Both counters are less than 128. The ECU will respond to errors with an Active Error Frame.
  • Error-Passive: One or both counters are greater than 127. The ECU will respond to errors with (network-transparent) Passive Error Frames.
    • The ECU must honor an 8-clock (as opposed to the standard 3) interframe space, the punitive Suspend Transmission Field
    • An error causing the ECU to move into the Error-Passive state triggers an Active (not a Passive) Error Frame
    • An Error-Passive ECU becomes Error-Active as soon as both counters fall below 128.
  • Bus-Off: If the TEC exceeds 255, transmission is disallowed until the host renegotiates with the CAN controller.
    • Even following renegotiation, a substantial delay is mandated (128 instances of 11 consecutive recessive bits).

The intent is error confinement: nodes which seem to see errors that no one else does ought remove themselves from the network.

Errors

  • Bit monitoring -- if transmitting following victory in arbitration, the value read from the bus ought match what we transmit
  • Bit stuffing -- five repeated transmitted bits must be followed by the opposite signal for one bit, which is removed from the data stream. if six consecutive bits of the same level are seen, it's an RX error
  • Frame check -- invalid value of a specified frame bit
  • ACK failure -- transmitter did not see a dominant signal in the ACK slot
  • CRC failure -- received CRC did not match computed CRC

Linux

You really, really want a device with a SocketCAN driver.

SocketCAN

SocketCAN drivers expose the CAN interface as a standard networking device, visible in /sys/class/net, and usable via the Berkeley Sockets Layer via the PF_CAN protocol family. SocketCAN is documented in Documentation/networking/can.txt. Protocol types include:

  • SOCK_RAW/CAN_RAW: Raw RX/TX
    • CAN_RAW_ERR_FILTER: control passing of Error Frames to the socket (default: disabled)
    • CAN_RAW_LOOPBACK: control whether transmitted frames are seen by other listeners on the host (default: enabled)
    • CAN_RAW_RECV_OWN_MSGS: if CAN_RAW_LOOPBACK is used, control whether transmitter sees their own frames (default: disabled)
    • CAN_RAW_FD_FRAMES: control RX/TX of CAN FD frames (and CANFD_MTU)
    • Receives and sends struct can_frame
      • Determine whether a device is capable of CAN FD by checking its advertised MTU
  • SOCK_DGRAM/CAN_BCM: Broadcast Manager mode, used for cyclic transmissions typical of sensor ECUs
    • Also simple timer-based interface for cyclic RX with failure notification

iproute

SocketCAN devices, including virtual interfaces (vcanX), are configured with the standard iproute tool.

  • Get statistics: ip -details -statistics link show canX
    • Alternatively, look at /proc/net/can/stats

Options for the device include:

  • restart-ms: automatic restart time in milliseconds in the event of bus-off state
    • restart can be used by itself for one-time bus recovery
    • A restart will send an error frame
  • one-shot: don't attempt to retransmit when we don't see an ACK
    • Note: when sniffing a non-virtual CAN interface, you must either (a) see an ACK -- requiring a connected device, or (b) be in one-shot mode, or you will not see your transmitted frames
  • bitrate: bits per second, i.e. "125000".
    • If CONFIG_CAN_CALC_BITTIMING is enabled in the kernel, this will set CiA timing parameters for the known bitrate.
    • Otherwise, they should be tediously provided via "tq", "prop_seg", "phase_seg1", "phase_seg2", and "sjw".
    • Additional controller-specific timing options can be discovered via ip -details
  • triple-sampling: whether to sample between each of the 4 bit quanta
  • berr-reporting: control delivery of error state event interrupts (default: disabled)

can-utils

The can-utils suite implements basic programs atop SocketCAN:

  • candump: log frames to file
  • canplayer: replay logged frames to CAN interface
  • cansniffer: show IDs and changing data in real time (limited to 11-bit IDs)
  • cansend: transmit a CAN frame

The tools above all have ISO-TP variants in the can-isotp project, replacing "can" with "isotp".

Non-SocketCAN

  • Can4Linux -- avoid
  • Lincan -- best of a bad lot, might have lower latency than SocketCAN
  • Proprietary -- lol

Attacking CAN buses

It is first necessary to map sniffed CAN data to a semantic model of diagnostics and commands. This is not generally difficult if one has access to the generating ECUs. Upon connection to an active CAN network, many ECUs are likely to be periodically broadcasting. Analyze their transmit periods and perform standard differential analysis on the payloads. Manipulate various inputs and observe how the traffic changes. I've found it useful to graph IDs against time in a scatterplot, using the data length to size my points, and a continuous mapping of raw data bytes to RGB values for point color.

Once the CAN messages corresponding to your desired behavioral results have been solved, it's a matter of

  • Injecting them into the CAN network,
  • Having them reach the desired controller, and
  • Establishing dominance over a competing ECU

Injection is trivial, requiring nothing more than a physical connection and the correct data link settings (timing etc.). Reaching the desired controller is a matter of gateways and CAN networks. If a filtering gateway sits between your ingress and the target ECU, it will be necessary to render the intermediate device passive, or seize control of it via firmware modifications.

Establishing dominance is a matter of conflict; some data producer is presumably broadcasting a datum, perhaps "DOOR LOCKED". If we were to simply inject the same ID with the "DOOR UNLOCKED" message, it's likely to be quickly obsoleted by an ECU broadcast. Even if we do manage to take effect for a short time before being invalidated, the resultant behavior can range from oscillation, to lockup, to lockout of CAN-based control. There are two main strategies; more details are available from the papers linked below:

  • Knock out the conflicting ECU. Given physical access, it simply can be disconnected. It might be possible to put it in a "diagnostic" or "bootrom" mode where it ceases to transmit. Flooding the bus with garbage will often knock some ECUs offline, though this can also affect the target device.
  • Invalidate the conflicting source ECU's messages. Remember that an Active Error Frame will disrupt any ongoing message. If we are able to act quickly enough (almost certainly requiring hardware support, but nothing an FPGA can't solve), we can see the rival ECU emit the contested ID, and immediately invalidate their traffic. Indeed, this will usually be sufficient to cause the rival ECU to enter the error-passive or bus-off states described above.

A third strategy is to find patterns of behavior which cause the target or a gateway to ignore what seems to be invalid traffic. Some ECUs require nonces to change in order to act on a message, for instance; if we can predict these nonces, we can preempt the source ECU's traffic with our own.

See Also