Unity
Unity
About
News
Events
Docs
Contact Us
code
search
login
Unity
Unity
About
News
Events
Docs
Contact Us
dark_mode
light_mode
code login
search

Documentation

  • Requesting An Account
  • Get Started
    • Quick Start
    • Common Terms
    • HPC Resources
    • Theory of HPC
      • Overview of threads, cores, and sockets in Slurm for HPC workflows
    • Git Guide
  • Connecting to Unity
    • SSH
    • Unity OnDemand
    • Connecting to Desktop VS Code
  • Get Help
    • Frequently Asked Questions
    • How to Ask for Help
    • Troubleshooting
  • Cluster Specifications
    • Node List
    • Partition List
    • Storage
    • Node Features (Constraints)
      • NVLink and NVSwitch
    • CPU Summary List
    • GPU Summary List
  • Managing Files
    • Command Line Interface (CLI)
    • Disk Quotas
    • FileZilla
    • Globus
    • Scratch: HPC Workspace
    • Unity OnDemand File Browser
  • Submitting Jobs
    • Batch Jobs
      • Array Batch Jobs
      • Large Job Counts
      • Monitor a batch job
    • Helper Scripts
    • Interactive CLI Jobs
    • Unity OnDemand
    • Message Passing Interface (MPI)
    • Slurm cheat sheet
  • Software Management
    • Building Software from Scratch
    • Conda
    • Modules
      • Module Usage
    • Renv
    • Unity OnDemand
      • JupyterLab OnDemand
    • Venv
  • Tools & Software
    • ColabFold
    • R
      • R Parallelization
    • Unity GPUs
  • Datasets
    • AI and ML
      • AlpacaFarm
      • audioset
      • bigcode
      • biomed_clip
      • blip_2
      • bloom
      • coco
      • Code Llama
      • DeepAccident
      • DeepSeek
      • DINO v2
      • epic-kitchens
      • florence
      • FLUX.1 Kontext
      • fomo
      • gemma
      • glm
      • gte-Qwen2
      • HiDream-I1
      • ibm-granite
      • Idefics2
      • Imagenet 1K
      • inaturalist
      • infly
      • internLM
      • internvl3-8b-hf
      • intfloat
      • kinetics
      • lg
      • linq
      • Llama2
      • llama3
      • llama4
      • Llava_OneVision
      • llm-compiler
      • Lumina
      • mims
      • mixtral
      • monai
      • msmarco
      • natural-questions
      • objaverse
      • openai-whisper
      • Perplexity AI
      • phi
      • playgroundai
      • pythia
      • qwen
      • rag-sequence-nq
      • s1-32B
      • satlas_pretrain
      • scalabilityai
      • sft
      • SlimPajama
      • t5
      • Tulu
      • V2X
      • video-MAE
      • vit
      • wildchat
    • Bioinformatics
      • AlphaFold3 Databases
      • BFD/MGnify
      • Big Fantastic Database
      • checkm
      • ColabFoldDB
      • Databases for ColabFold
      • dfam
      • EggNOG - version 5.0
      • EggNOG - version 6.0
      • EVcouplings databases
      • Genomes from NCBI RefSeq database
      • GMAP-GSNAP database (human genome)
      • GTDB
      • Illumina iGenomes
      • Kraken2
      • MGnify
      • NCBI BLAST databases
      • NCBI RefSeq database
      • Parameters of AlphaFold
      • Parameters of Evolutionary Scale Modeling (ESM) models
      • PDB70
      • PINDER
      • PLINDER
      • Protein Data Bank
      • Protein Data Bank database in mmCIF format
      • Protein Data Bank database in SEQRES records
      • Tara Oceans 18S amplicon
      • Tara Oceans MATOU gene catalog
      • Tara Oceans MGT transcriptomes
      • Uniclust30
      • UniProtKB
      • UniRef100
      • UniRef30
      • UniRef90
      • Updated databases for ColabFold
    • Using HuggingFace Datasets

Documentation

  • Requesting An Account
  • Get Started
    • Quick Start
    • Common Terms
    • HPC Resources
    • Theory of HPC
      • Overview of threads, cores, and sockets in Slurm for HPC workflows
    • Git Guide
  • Connecting to Unity
    • SSH
    • Unity OnDemand
    • Connecting to Desktop VS Code
  • Get Help
    • Frequently Asked Questions
    • How to Ask for Help
    • Troubleshooting
  • Cluster Specifications
    • Node List
    • Partition List
    • Storage
    • Node Features (Constraints)
      • NVLink and NVSwitch
    • CPU Summary List
    • GPU Summary List
  • Managing Files
    • Command Line Interface (CLI)
    • Disk Quotas
    • FileZilla
    • Globus
    • Scratch: HPC Workspace
    • Unity OnDemand File Browser
  • Submitting Jobs
    • Batch Jobs
      • Array Batch Jobs
      • Large Job Counts
      • Monitor a batch job
    • Helper Scripts
    • Interactive CLI Jobs
    • Unity OnDemand
    • Message Passing Interface (MPI)
    • Slurm cheat sheet
  • Software Management
    • Building Software from Scratch
    • Conda
    • Modules
      • Module Usage
    • Renv
    • Unity OnDemand
      • JupyterLab OnDemand
    • Venv
  • Tools & Software
    • ColabFold
    • R
      • R Parallelization
    • Unity GPUs
  • Datasets
    • AI and ML
      • AlpacaFarm
      • audioset
      • bigcode
      • biomed_clip
      • blip_2
      • bloom
      • coco
      • Code Llama
      • DeepAccident
      • DeepSeek
      • DINO v2
      • epic-kitchens
      • florence
      • FLUX.1 Kontext
      • fomo
      • gemma
      • glm
      • gte-Qwen2
      • HiDream-I1
      • ibm-granite
      • Idefics2
      • Imagenet 1K
      • inaturalist
      • infly
      • internLM
      • internvl3-8b-hf
      • intfloat
      • kinetics
      • lg
      • linq
      • Llama2
      • llama3
      • llama4
      • Llava_OneVision
      • llm-compiler
      • Lumina
      • mims
      • mixtral
      • monai
      • msmarco
      • natural-questions
      • objaverse
      • openai-whisper
      • Perplexity AI
      • phi
      • playgroundai
      • pythia
      • qwen
      • rag-sequence-nq
      • s1-32B
      • satlas_pretrain
      • scalabilityai
      • sft
      • SlimPajama
      • t5
      • Tulu
      • V2X
      • video-MAE
      • vit
      • wildchat
    • Bioinformatics
      • AlphaFold3 Databases
      • BFD/MGnify
      • Big Fantastic Database
      • checkm
      • ColabFoldDB
      • Databases for ColabFold
      • dfam
      • EggNOG - version 5.0
      • EggNOG - version 6.0
      • EVcouplings databases
      • Genomes from NCBI RefSeq database
      • GMAP-GSNAP database (human genome)
      • GTDB
      • Illumina iGenomes
      • Kraken2
      • MGnify
      • NCBI BLAST databases
      • NCBI RefSeq database
      • Parameters of AlphaFold
      • Parameters of Evolutionary Scale Modeling (ESM) models
      • PDB70
      • PINDER
      • PLINDER
      • Protein Data Bank
      • Protein Data Bank database in mmCIF format
      • Protein Data Bank database in SEQRES records
      • Tara Oceans 18S amplicon
      • Tara Oceans MATOU gene catalog
      • Tara Oceans MGT transcriptomes
      • Uniclust30
      • UniProtKB
      • UniRef100
      • UniRef30
      • UniRef90
      • Updated databases for ColabFold
    • Using HuggingFace Datasets

On this page

  • Example
  1. Unity
  2. Documentation
  3. Datasets
  4. Using HuggingFace Datasets

Using HuggingFace Datasets

To use a HuggingFace dataset in Unity, you need to configure the HuggingFace environment variable. If you need a HuggingFace model, check its availability at Unity AI Datasets. If the model is unavailable, please contact the Unity team.

Example

To load the Llama-3.2-3B model, which is stored at /datasets/ai/llama3, set the HuggingFace environment variable accordingly
There are 3 supported options for huggingface library to recognize the model

Option 1: Set environment variable directly on your terminal before running the script
Note that notebook cell doesn’t support Option 1. Try this on your terminal

export HF_HOME=/datasets/ai/llama3

# export HF_HOME=/datasets/ai/llama3 # <- Set environment path in the terminal
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.2-3B")
model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.2-3B")

Option 2: Use Absolute Model Path directly
Huggingface stores model parameters/tokenizer/config files in a specific directory structure as follows

from transformers import AutoTokenizer, AutoModelForCausalLM

model_path = '/datasets/ai/llama3/hub/models--meta-llama--Llama-3.2-3B/snapshots/13afe5124825b4f3751f836b40dafda64c1ed062'
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(model_path)

Option 3: Set HF_HOME in the script
os.environ["HF_HOME"] = "/datasets/ai/llama3" must be placed above importing transformer library
Note that notebook cell doesn’t support Option 3. Try this on your terminal

import os

os.environ["HF_HOME"] = "/datasets/ai/llama3" # <- This line is placed above "from transformers.."
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.2-3B")
model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.2-3B")

See HuggingFace’s cache management documentation for more information.

For usage details, click the “Use this model” button on the HuggingFace model page.

HuggingFace Model Page

HuggingFace Model Page

Last modified: Tuesday, October 7, 2025 at 11:55 AM. See the commit on GitLab.
University of Massachusetts Amherst University of Massachusetts Amherst University of Rhode Island University of Rhode Island University of Massachusetts Dartmouth University of Massachusetts Dartmouth University of Massachusetts Lowell University of Massachusetts Lowell University of Massachusetts Boston University of Massachusetts Boston Mount Holyoke College Mount Holyoke College Smith College Smith College Olin College of Engineering Olin College of Engineering
search
close