Linguistic Features · spaCy Usage Documentation
spaCy is a free open-source library for Natural Language Processing in Python. It features NER, POS tagging, dependency parsing, word vectors and more.
Overview

Added
March 17, 2026
Subject & domain
computer-science-advanced · natural-language-processing-nlp
Grade range
Grade 9 (Freshman)–Grade 12 (Senior)
Page kind
Article
Introduction
spaCy Linguistic Features Overview
spaCy processes raw text into Doc objects containing rich linguistic annotations. To optimize memory, spaCy stores attributes as hash values; appending an underscore (e.g., .pos_) retrieves the readable string representation.
- Part-of-Speech (POS) Tagging: Uses statistical models to predict tags based on context. Fine-grained tags (
Token.tag) provide detailed morphological info, while coarse-grained tags (Token.pos) provide general categories. - Morphological Analysis:
- Inflectional Morphology: Modifying a root (lemma) with prefixes/suffixes to change grammatical function without changing the POS.
- Morphologizer: A statistical component that assigns morphological features (
Token.morph) and POS tags. - Rule-based approach: Used for languages with simpler systems, mapping fine-grained tags to coarse-grained tags and features.
- Lemmatization: The process of reducing words to their root form. spaCy offers three methods:
- Lookup: Maps surface forms to lemmas via tables (requires
spacy-lookups-data). - Rule-based: Uses language-specific rules and exception files (e.g., WordNet for English) based on POS/morphology.
- Trainable (
EditTreeLemmatizer): Learns form-to-lemma transformations from a corpus, often achieving higher accuracy than rule-based methods.
- Lookup: Maps surface forms to lemmas via tables (requires
- Dependency Parsing:
- Analyzes syntactic relationships between words, represented as a tree of "heads" and "children."
- Noun Chunks: Identifies "base noun phrases" (a noun plus its descriptors) via
Doc.noun_chunks. - Navigation: Provides an API to traverse the dependency tree, check for annotations (
doc.has_annotation("DEP")), and extract syntactic relations.
- Data Management: Lemmatization tables and lookup data are distributed via the
spacy-lookups-datapackage.
Community reviews
No published reviews yet. Be the first to share your experience.