MF4 Decoders - DBC Decode CAN Data to CSV/Parquet [Data Lakes]

MF4 decoders - DBC decode CAN bus and LIN bus data to physical values

Need to DBC decode your CAN/LIN data to CSV/Parquet files?

The CANedge records raw CAN/LIN data to an SD card in the popular ASAM MDF format (MF4).

The simple-to-use MF4 decoders let you DBC decode your log files to CSV or Parquet files - enabling easy creation of powerful data lakes and integration with 100+ software/API tools.

Learn more below - and try the decoders yourself!

MF4 decoders CSV Parquet data lake CAN bus data

DBC DECODE

DBC decode your MF4 files to interoperable CSV/Parquet files

Browser Based Linux Mac Windows OS Agnostic

DRAG & DROP

Drag & drop files/folders onto the decoder to process them

AUTOMATE

Optionally use via the command line or in scripts for automation

decrypt decompress CAN bus data log files

DATA LAKE

Easily create powerful Parquet data lakes for use in 100+ tools

MDF4 Linux Windows Platform Operating System Agnostic

WINDOWS/LINUX

Decoders can be used on both Windows and Linux operating systems

100% FREE

The decoders are 100% free and can be integrated into your own solutions

The ASAM MDF (Measurement Data Format) is a popular, open and standardized format for storing bus data e.g. from CAN bus (incl. J1939, OBD2, CAN FD etc) and LIN bus.

The CANedge records raw CAN/LIN data in the latest standardized version, MDF4 (*.MF4). The log file format is ideally suited for pro specs CAN logging at high bus loads and enables both lossless recording and 100% power safety. Further, the CANedge supports embedded encryption and compression of the log file data (both cases natively supported by the MF4 decoders).

The raw MF4 data from the CANedge can be loaded natively in various software/API tools, including the asammdf GUI/API, our Python API, the MF4 converters - and the MF4 decoders (detailed in this article).

To learn more about the MDF4 file format, see our MF4 intro.

The MF4 decoders can be deployed in numerous ways - below are two examples:

Example 1: Local PC drag & drop usage

A CANedge records raw CAN/LIN data to an SD card
A log file folder from the SD card is copied to a PC
The relevant DBC files are placed next to the MF4 decoder
The data is DBC decoded to CSV via drag & drop
The CSV files can be directly loaded in e.g. Excel

MF4 decoder local deployment Excel CSV CAN bus

Grafana-Athena dashboard integration CAN bus telematics

Example 2: Cloud based auto-processing

A CANedge uploads raw CAN/LIN data to an AWS S3 bucket
When a log file is uploaded it triggers a Lambda function
The Lambda uses the MF4 decoder and DBC files from S3
Via the Lambda, the uploaded file is decoded to Parquet files
The Parquet files are written to an AWS S3 'output bucket'
The data lake can be visualized in e.g. Grafana dashboards

Learn more in our Grafana dashboard article.

The decoders support DBC files for practically any CAN bus, CAN FD or LIN bus protocol. This includes e.g. OBD2, J1939, NMEA 2000, ISOBUS, CANopen and proprietary OEM-specific DBC files. For details, see the MF4 decoder docs. For each DBC file, you specify which CAN/LIN channel to apply it to (e.g. can1, can2, lin1 etc) and you can provide multiple DBC files per channel.

Yes, the MF4 decoders support CAN based transport protocols, including ISO TP (OBD, UDS), J1939 TP (J1939, ISOBUS, NMEA 2000) and NMEA 2000 TP (NMEA 2000 Fast Packets).

You can customize specific TP related aspects in the DBC file(s) you use with the MF4 decoders and the latest DBC files we provide are generally prepared for use with the MF4 decoders out-the-box. The decoders will automatically identify valid multi-frame sessions, re-assemble the CAN frames and enable DBC decoding of the assembled frames.

The MF4 decoders enable you to output DBC decoded CAN/LIN data as either CSV files or Parquet files. Below we briefly outline the key differences between the formats:

CSV files are simple, text-based, and universally compatible, ideal for small to medium-sized datasets and ad hoc analyses. They are easy to use, but less efficient for large data
Parquet files are binary and offer vastly faster performance and storage efficiency vs CSV - but require more specialized tools for analysis
While CSVs are straightforward for basic tasks, Parquet is generally preferable for performance-intensive analysis and handling large-scale time series data

For most use cases, we recommend to use the Parquet file format if both options exist - and many of our plug & play integrations take outset in Parquet data lakes for the above reasons.

FUNCTIONALITY

INTEGRATION EXAMPLES

DBC decode raw MF4 data via drag & drop

The simple-to-use MF4 decoders let you drag & drop CANedge log files with raw CAN/LIN data to DBC decode them using your own DBC file(s) - outputting the data as CSV or Parquet files.

Batch decode (nested) folders

You can also drag & drop entire folders of MF4 log files onto a decoder to batch process the files. This also works for nested folders with e.g. thousands of log files.

Automate decoding via CLI/scripts

The decoder executables can be called via the CLI or from any programming language. Ideal for automated DBC decoding locally, in the cloud (e.g. in AWS Lambda), on Raspberry Pis etc.

import subprocess

subprocess.run(["mdf2parquet_decode.exe", "-i", "input", "-O", "output"])

Easily use with S3 storage

The CANedge2/CANedge3 upload data to your own S3 server. Mount your S3 bucket and use the MF4 decoders as if files were stored locally. Or use in e.g. AWS Lambda for full automation.

Easily decompress and/or decrypt your raw data

The CANedge supports embedded compression and encryption of log files on the SD card. The MF4 decoder natively supports compressed/encrypted files, simplifying post processing.

Create powerful Parquet data lakes

The decoders are ideal for creating powerful Parquet data lakes with an efficient date-partitioned structure of concatenated files - stored locally or e.g. on S3.

Visualize your CAN/LIN data in Grafana dashboards

Many dashboard tools can query data from Parquet data lakes via SQL interfaces (like Athena or ClickHouse), enabling low cost scalable visualization - see e.g. our Grafana-Athena intro.

Use Python to analyze Parquet data lakes

Python supports Parquet data lakes, enabling e.g. big data analysis. With S3 support, you can also analyze data directly in e.g. Colab Jupyter Notebooks. See the docs for script examples.

Use MATLAB to analyze Parquet data lakes

MATLAB natively supports Parquet data lakes - making it easy to perform advanced analysis at scale with support for S3 and out-of-memory tall arrays. See the docs for script examples.

Use Excel or Power BI to analyze your data lakes

Excel and Power BI let you load DBC decoded CSV/Parquet files for quick analysis - or use e.g. Athena/ClickHouse ODBC drivers to query data (beyond memory) from your data lakes via SQL

Easily analyze data via ChatGPT

ChatGPT is great for analysing large amounts of DBC decoded CAN/LIN data in CSV format. Learn more in our intro.

Want to try this yourself? Download the decoders and MF4 sample data below:

Download decoders Download MDF4 data

Store your data lake anywhere - and integrate with everything

Parquet data lakes combine cheap flexible storage with efficient interoperable integration opportunities.

Agnostic low cost storage

Parquet data lakes are comprised of compact, efficient binary files - meaning they can be stored at extremely low cost in any cloud file storage (e.g. AWS S3, Google Cloud Storage, Azure Blob Storage), self-hosted S3 buckets (e.g. MinIO) - or simply on your local disk. Storing Parquet files in e.g. AWS S3 is typically 95% lower cost vs. storing the equivalent data volume in a database.

Native Parquet support

As illustrated, Parquet data lakes are natively supported by a wide array of tools. For example, you can directly work with Parquet files within any programming language like Python/MATLAB - whether the files are stored locally or on S3. Further, Parquet files can be natively loaded in many desktop tools like Microsoft Power BI or Tableau Desktop.

Powerful interfaces

Parquet data lakes are natively supported by a 'interfaces' like Amazon Athena, Google BigQuery, Azure Synapse and open source options like ClickHouse and DuckDB. These expose SQL query interfaces and ODBC/JDBC drivers that dramatically expand integration options - and super charge query speed. You can for example use interfaces to visualize your data in Grafana dashboards.

Parquet data lakes are comprised of files, meaning they can be stored in file storage solutions like e.g. AWS S3. Storing data on S3 is incredibly low cost (0.023$/GB/month) compared to most databases (typically ~1.5$/GB/month), which is relevant as many CAN/LIN data logging use cases can require terabytes of storage over time.

The 'downside' to storing files on S3 vs. in a database is generally the fact that it is much slower to query the data, e.g. for analytics or visualization. However, this is where the interface tools like Amazon Athena come into play as outlined below.

We refer to tools like Amazon Athena, Google BigQuery and Azure Synapse Analytics as 'interfaces' for simplicity. They can also be referred to as cloud based serverless data warehouse services. The serverless part is important: It means that it's simple to set up - and you pay only when you query data.

Many automotive OEM engineers often need to store terabytes of data for analysis - yet, the engineers may only need to access small subsets of the data on an infrequent basis. Yet, when they access the data, the query speed has to be fast - even if they query gigabytes of data. Tools like Amazon Athena are ideally suited for this. When you query the data, Athena spins up the necessary compute and parallelization in real-time - meaning you can extract insights across gigabytes of your S3 data lake in seconds using standard SQL queries. At the same time, all the complexity is abstracted away - and the solutions can be automatically deployed as per our step-by-step guides.

There are too many software/API integration examples for us to list - below is a more extensive recap of tools:

Direct integration examples

Below are examples of tools that can directly work with Parquet files:

MATLAB: Natively supports local or S3 based Parquet data lakes, with powerful support for out-of-memory tall arrays
Python: Natively supports local or S3 based Parquet data lakes and offers libraries for key interfaces (Athena, ClickHouse etc.)
Power BI: Supports reading Parquet files from the local filesystem, Azure Blob Storage, and Azure Data Lake Storage Gen2
Tad: Free desktop tool for viewing and analyzing tabular data incl. Parquet files. Useful for ad hoc review of your data
Apache Spark: A unified analytics engine for large-scale data processing that supports Parquet files
Databricks: A platform for massive scale data engineering and collaborative data science, supporting Parquet files
Tableau: A data visualization tool that can connect to Parquet files through Spark SQL or other connectors
Apache Hadoop: Supports Parquet file format for HDFS and other storage systems
PostgreSQL: With the appropriate extensions, it can query Parquet files
Cloudera: Offers a platform that includes Parquet file support
Snowflake: A cloud data platform that can load and query Parquet files
Microsoft SQL Server: Can access Parquet files via PolyBase
MongoDB: Can import data from Parquet files using specific tools and connectors
Teradata: Supports querying Parquet files using QueryGrid or other connectors
Apache Drill: A schema-free SQL Query Engine for Hadoop, NoSQL, and Cloud Storage, which supports Parquet files
Vertica: An analytics database that can handle Parquet file format
IBM Db2: Can integrate with tools to load and query Parquet files

Interface based integrations

Below are examples of tools that can integrate via interfaces like Athena, BigQuery, Synapse, ClickHouse etc.:

Power BI (driver): By installing a JDBC/ODBC driver (for e.g. Athena), you can use SQL to query your data lake
Excel (driver): By installing a JDBC/ODBC driver (for e.g. Athena), you can use SQL to query your data lake
Grafana: Offers powerful and elegant dashboards for data visualization, ideal for visualizing decoded CAN/LIN data
Tableau: Known for its interactive data visualization capabilities, especially popular for business intelligence applications
Looker: Employs an analytics-oriented application framework, including business intelligence and data exploration features
Google Data Studio: Customizable reports/dashboards, known for user-friendly design and integration with Google services
AWS QuickSight: A fast, cloud-powered business intelligence service that integrates easily with e.g. Amazon Athena
Apache Superset Apache Superset is an open-source data exploration and visualization platform written in Python
Deepnote Deepnote is a collaborative data notebook built for teams to discover and share insights
Zing Data Zing Data is a data exploration and visualization platform supporting e.g. ClickHouse
Explo Customer-facing analytics for any platform. Designed for beautiful visualization. Engineered for simplicity
Metabase Metabase is an easy-to-use, open source UI tool for asking questions about your data
Qlik: Offers end-to-end, real-time data integration and analytics solutions, known for the associative exploration user interface
Domo: Combines a powerful back-end with a user-friendly front-end, ideal for consolidating data systems into one platform
Sisense: Known for its drag-and-drop user interface, enabling easy creation of complex data models and visualizations
MicroStrategy: Offers a comprehensive suite of BI tools, emphasizing mobile analytics and hyper-intelligence features
Splunk: Specializes in processing and analyzing machine-generated big data via a web-style interface
Exasol: Offers a high-performance, in-memory, MPP database designed for analytics and fast data processing
Alteryx: Provides an end-to-end platform for data science and analytics, facilitating easy data blending and advanced analytics
SAP Analytics Cloud: Offers business intelligence, augmented analytics, predictive analytics, and enterprise planning
IBM Cognos Analytics: Integrates AI to help users visualize, analyze, and share actionable business insights
GoodData: Provides cloud-based tools for big data and analytics, with a focus on enterprise-level data management and analysis
Dundas BI: Offers flexible dashboards, reporting, and analytics features, allowing for tailored BI experiences
Yellowfin BI: Delivers business intelligence tools and a suite of analytics products with collaborative features for sharing insights
Reveal: Provides embedded analytics and a user-centric design, making data more accessible for decision makers and teams
Chartio: A cloud-based data exploration tool, known for its ease of use and ability to blend data from multiple sources

Visualize data in Grafana dashboards

Want to create dashboard visualizations across all of your CAN/LIN data?

The CANedge2/CANedge3 is ideal for collecting CAN/LIN data to your own server (cloud or self-hosted). A common requirement for OEMs and system integrators is the ability to create dashboards for visualizing the decoded data. Here, the MF4 decoders can automate the creation of Parquet data lakes at any scale (from MB to TB) stored on S3 - ready for visualization via Grafana dashboards. Learn more in our dashboard article.

Grafana Athena dashboard Parquet Data Lake

MATLAB Big Data Processing Fleet Telematics Vehicle Network Toolbox

Analyze fleet performance in MATLAB/Python

Need to perform advanced large scale analyses of your data?

The CANedge3 lets you record raw CAN data to an SD card and auto-push it to your own S3 server via 3G/4G. Uploaded files can be DBC decoded to a Parquet data lake, output into a separate S3 bucket. This makes it easy to perform advanced statistical analysis via MATLAB or Python, as both natively support loading Parquet data lakes stored on S3. In turn, this lets you perform advanced analyses - with minimal code. See our script examples to get started.

Quickly analyse data as CSV via Excel

Need to swiftly review your DBC decoded data?

The MF4 decoders can be useful to quickly understand what can be DBC decoded from your raw CAN/LIN data. By simply drag & dropping your LOG/ folder from the CANedge SD card, you can create a copy in DBC decoded CSV form - and directly load this data for analysis in Excel. If you wish to perform more efficient analysis of large amounts of data in Excel, you can alternatively use e.g. a ODBC driver via Athena, DuckDB or Clickhouse - enabling efficient out-of-memory analyses.

Load DBC decoded CAN bus data as CSV quickly in Excel

Create a self-hosted multi-purpose Parquet data lake

Need a 100% self-hosted Parquet data lake - using open source tools only?

If you prefer to self-host everything, you can e.g. deploy a CANedge2/CANedge3 to upload data to your own self-hosted MinIO S3 bucket (100% open source) running on your own Windows/Linux machine (or e.g. a virtual machine in your cloud). You can run a cron job to periodically process new MF4 log files and output the result to your Parquet data lake. The Parquet files can be analysed directly via Python. Further, you can integrate it with an open source tool like ClickHouse for ODBC driver integrations or dashboard visualization via Grafana dashboards.

FAQ

Yes, you control 100% how you create and store your CSV/Parquet data lake.

In our examples, we frequently take outset in a setup where your CANedge2/CANedge3 uploads data to an AWS S3 bucket - with uploaded data automatically processed via AWS Lambda functions. This is a common setup that we provide plug & play solutions for - hence it will often be the simplest way to deploy your data processing and data lake.

However, you can set this up in any way you want. For example, you might store your uploaded log files on a self-hosted MinIO S3 bucket instead. In such a scenario, you can periodically process new MF4 log files manually (e.g. via drag & drop or the CLI) to update your data lake - or you can set up e.g. a cron job or similar service to handle this. The data lake can be stored in another MinIO S3 bucket - and you can then directly work with the data lake from here (e.g. in MATLAB/Python) or integrate the data using an open source system like ClickHouse or DuckDB.

The same principle applies if you upload data to Google Cloud S3 or Azure blob storage (via an S3 gateway) - here you can use their native data processing services to deploy the MF4 decoders if you wish to fully automate the DBC decoding of incoming data. We do not provide plug & play solutions for deploying this, however.

Of course, you can also simply use the MF4 decoders locally to create a locally stored Parquet data lake. This will often suffice if you're e.g. using a CANedge1 to record your CAN/LIN data - and you simply wish to process this data on your own PC. In such use cases, the Parquet data lake can still be a powerful tool, since it makes it much easier to perform large-scale data processing compared to tools like the asammdf GUI.

We provide two types of MF4 executables for use with the CANedge: The MF4 converters and MF4 decoders.

The MF4 converters let you convert the MDF log files to other formats like Vector ASC, PEAK TRC and CSV. These converters do not perform any form of DBC decoding of the raw CAN/LIN data - they only change the file format.

The MF4 decoders are very similar in functionality as the MF4 converters. However, these excecutables DBC decode the log files to physical values, outputting them as either CSV or Parquet files. When using the MF4 decoders, you provide your DBC file(s) to enable the decoding. These executables are ideal if your goal is to analyse the human-readable form of the data and/or e.g. create 'data lakes' for analysing/visualizing the data at scale.

No, you simply download the converter executables - no installation required.

As evident, there are almost limitless options for how you can deploy your MF4 decoders, how you can store the resulting data lake, how you provide interfaces for it - and what software/API tools you integrate it with.

You can use the CANedge and our MF4 decoders to facilitate any of these deployment setups. However, our team offers step-by-step guides and technical support only on limited sub sets of the deployments, such as our Grafana-Athena integration.

Need an interoperable CAN logger?

Get your CANedge today!

Buy now Contact us

Recommended for you

CAN TELEMATICS AT SCALE

CANedge2 - Dual CAN Bus Telematics Dongle

CANEDGE2 - WIFI CAN LOGGER

CANEDGE3 - 3G/4G CAN LOGGER