Natural Language Processing (NLP)

NLP use cases are characterized by large unstructured data sets that can create performance bottlenecks as they scale.

market sentiment

Sentiment Analysis

In the example below, we show a market sentiment NLP implementation in Pachyderm. In it, we use transfer learning to fine-tune a BERT language model to classify text for financial sentiment. 

It shows how to combine inputs from separate sources, incorporates data labeling, model training, and data visualization.

FinBERT is a pre-trained NLP model that is adapted to analyze the sentiment of financial text. The original BERT-based language model was trained with a large corpus of Reuters and training code

View the code on GitHub

Key Features of Pachyderm

Pachyderm is cost-effective at scale and enables data engineering teams to automate complex pipelines with sophisticated data transformations


Deliver reliable results faster maximizes dev efficiency.

Automated diff-based data-driven pipelines.

Deduplication of data saves infrastructure costs.


Immutable data lineage ensures compliance.

Data versioning of all data types and metadata. 

Familiar git-like structure of commits, branches, & repos.


Leverage existing infrastructure investment.

Language agnostic - use any language to process data 

Data agnostic - unstructured, structured, batch, & streaming

