Dynamic Indexing

Introduction

A patent digest is a curated set of patent documents that all disclose a common technical concept. Multiple patent digests can be arranged into a collection that elucidates various facets of a given technology. The websites patentdigests.net and patentdigest.com host several of these collections, called Community Digest Collections. The structure of these Community Digest Collections is informed by an approach called Dynamic Indexing. Dynamic Indexing draws on ideas from library science, computer science, and patent prosecution practice to create a system that makes indexing of patent documents into a taxonomy of facets more efficient and more effective. For searching these facets, Dynamic Indexing introduces a new approach called the Power Stack. This paper discusses various features of Dynamic Indexing.

Patent Documents

First, what is a patent document, how is it structured, and what are the different types of patent document?

  • A patent records an invention and the legal right to exclude others from making, using, or selling it, in exchange for public disclosure of how it works.
  • It serves both as a technical teaching for the public and as an enforceable basis for asserting patent rights.

There are many different national patent systems around the world, each with its own legal infrastructure and document types. For the purposes of this paper, we will focus on United States patent related publicatons.

United States patent-related publications include several document types that cover applications, granted patents, and later changes such as corrections or reexaminations.

Main patent types

  • Utility patents protect new and useful processes, machines, manufactures, or compositions of matter, and are the most common U.S. patent type.
  • Design patents protect new, original, and ornamental designs for an article of manufacture, focusing on appearance rather than function.
  • Plant patents protect new and distinct asexually reproduced plants, such as cultivated sports, mutants, or hybrids.

Application publications

  • Pre‑grant patent application publications are typically published 18 months after the earliest filing or priority date and usually carry kind code A1,
  • Republished or corrected application publications (e.g., A2, A9) reflect updates or corrections to the originally published application.

Granted patent publications

  • Granted utility patents are published with kind codes such as B1 (no prior pre‑grant publication) or B2 (with pre‑grant publication).
  • Granted design and plant patents also publish as separate patent documents, often indicated by kind codes S for design and P1–P3 for plant patents.

Post‑grant publications

  • Reissue patents (E1, E) are new patent publications that replace an earlier, still‑unexpired patent to correct errors without adding new matter.
  • Reexamination certificates (C1, C2, C3…) publish changes to an already granted patent after the USPTO has reexamined its validity in light of new prior art.

This paper is concerned with describing a type of faceted indexing and its use in searching patent literature. It will not delve into the complexities of patent prosecution any more than necessary.

The Patent Search Problem -- What problem(s) are we trying to solve?

Why search patents? What are the kinds of search? Before a patent is issued, a search is conducted to assure that the invention claimed in the patent is novel and unobvious over the prior art.

Types of Patent Searches

  • Claims based search ( exmaination, validity)
  • Patentability
  • Infringement
  • Clearance
  • State of the Art

The features of the Dynamic Indexing system described here are directed to claims based searching.

Broader vs Narrow specification of Claim Limitations

A patent claim sets forth the subject matter that an inventor wishes to protect. In the context of patent searching, a claim can be considered as a recipe for conducting the search.

What are some prior concepts that might be helpful?

Let's take a look at some existing conceptual frameworks that are pertinent to patent searching.

prior concept #1 -- Every New Idea is a Combination of Old Ideas

The key concept underlying this project is that every new idea can be considered as a combination of old ideas. If this is so, an effective search of a 'new idea' entails decompositon the new idea into a set of old ideas and an efficient way of locating documents that disclose those old ideas.

In patent jurisprudence, concept that new ideas are derived from old ideas is frequently explicitly expressed. Recent examples:

The Supreme Court case KSR International Co. v. Teleflex Inc. (2007), invalidating a patent for an adjustable vehicle pedal as an obvious combination and declaring: "This is so because inventions in most, if not all, instances rely upon building blocks long since uncovered, and claimed discoveries almost of necessity will be combinations of what, in some sense, is already known."

This was codified and elaborated in Graham v. John Deere Co. (1966), where the Court examined patents as "a combination of old mechanical elements".

Patent searching does not have the same philosophical overlay as its jurisprudence cousin, but the idea is implicit in the activity of searching.

prior concept #2 -- Hierarchical (enumerative?) Classification

In library science Hierarchical Classification theory posits that knowledge can be systematically arranged in a tree-like structure, progressing from general to specific categories to reflect logical subdivisions of subjects. This approach, rooted in Aristotelian principles of genus and species, enables efficient resource location by mirroring the natural relationships among concepts. It contrasts with flat or non-hierarchical systems by imposing a top-down order, where broader classes encompass narrower ones, often denoted through notation or indentation.

A foundational example is the Dewey Decimal Classification (DDC), introduced by Melvil Dewey in 1876, which divides knowledge into ten main classes (e.g., 000–099 for Computer Science, Information, and General Works), each further broken into divisions and sections (e.g., 500 for Natural Sciences, with 510 for Mathematics). This hierarchy facilitates browsing and shelving in libraries, as users can drill down from overarching disciplines to precise topics. Similarly, the Library of Congress Classification (LCC) employs alphanumeric codes in a hierarchical manner, with main classes like "Q" for Science subdividing into "QA" for Mathematics and further into "QA75" for Electronic Computers. The theory emphasizes mnemonic devices and hospitality—allowing for insertions of new subjects without disrupting the structure—while addressing limitations such as rigidity in interdisciplinary areas. In practice, hierarchical systems promote consistency in cataloging but may require auxiliary tools like indexes for cross-references. This framework has influenced patent classification, where hierarchies aid in delineating technological scopes, though it can sometimes overlook multifaceted inventions.

prior concept #3 -- Faceted Classification

Faceted classification theory, pioneered by S.R. Ranganathan in the 1930s through his Colon Classification (CC), revolutionizes knowledge organization by breaking subjects into independent facets or attributes that can be combined dynamically to describe resources. Unlike enumerative systems that pre-list all possible classes, faceted approaches offer flexibility, allowing multidimensional access tailored to user needs. Ranganathan's Five Laws of Library Science underpin this, emphasizing that libraries should save users' time through adaptable structures. Core to the theory are fundamental facets, often summarized in Ranganathan's PMEST model: Personality (core topic), Matter (materials), Energy (processes), Space (location), and Time (period). For instance, a book on "Indian agriculture in the 20th century" might be classified by combining facets: Agriculture (Personality), Crops (Matter), Cultivation (Energy), India (Space), and 1900s (Time), yielding a notation like "J:2;4.44'N". This synthesis enables precise, context-specific retrieval without exhaustive enumeration. In digital environments, faceted classification enhances search interfaces, as seen in e-commerce sites or library catalogs where users filter by multiple criteria (e.g., author, format, subject). It addresses hierarchies' limitations by accommodating complexity, such as in interdisciplinary fields, but requires careful facet design to avoid ambiguity. Applied to patent literature, faceted systems could dissect inventions by technical features, inventors, or applications, paving the way for innovative indexing schemes that improve discoverability in rapidly evolving technologies.

... faceted classification avoids redundancy by modularly combining attributes rather than enumerating every possible permutation.

prior concept #4 -- Dynamic Programming and Memoization

Dynamic Programming (DP) is a computational paradigm in computer science used to solve complex problems by breaking them into smaller, overlapping subproblems, solving each subproblem only once, and storing their solutions for reuse. It is particularly effective for optimization problems where decisions at one stage affect future outcomes, such as shortest paths, resource allocation, or sequence alignment. DP is grounded in the principle of optimality, which states that an optimal solution to a problem contains optimal solutions to its subproblems. It contrasts with divide-and-conquer by addressing overlapping subproblems, avoiding redundant computations. DP approaches typically follow two strategies: top-down (recursive with memoization) and bottom-up (iterative with tabulation). In the top-down approach, the problem is recursively divided, and solutions are cached to avoid recalculating results for identical subproblems. The bottom-up approach builds solutions iteratively from smaller to larger subproblems, storing results in a table, often more space-efficient but less intuitive for some problems. A classic example is the Fibonacci sequence. Computing Fibonacci numbers naively (e.g., F(n) = F(n-1) + F(n-2)) leads to exponential time complexity due to repeated calculations. DP reduces this to linear time by storing intermediate results. For instance, calculating F(5) involves F(4) and F(3), but F(3) is reused, so storing its value eliminates redundant work. Memoization is a key technique in the top-down DP approach, where intermediate results are stored in a data structure (e.g., array, hash table) to cache solutions to subproblems. When a subproblem is encountered again, the cached result is retrieved instead of recomputed, significantly improving efficiency. For example, in the Fibonacci case, a memoization table stores F(n) for each n computed, ensuring each subproblem is solved only once. This reduces the time complexity from O(2^n) to O(n), though it requires O(n) space for storage. Memoization shines in problems like the knapsack problem, where items are selected to maximize value within a weight constraint. By caching results for subproblems (e.g., value achievable with a subset of items and remaining capacity), memoization avoids recalculating overlapping combinations. However, it may consume more memory than bottom-up DP due to recursive call stack overhead and is less intuitive for problems requiring iteration over states.

In the realm of computer science, which intersects with library and information science through algorithmic approaches to organizing and retrieving data—such as in patent classification systems—the term "memoization" holds particular significance. This technique, often employed in dynamic programming to optimize searches and classifications by caching results, traces its etymological roots to the Latin word "memorandum," meaning "to be remembered." The word was deliberately coined by British artificial intelligence researcher Donald Michie in 1968, drawing from "memo" (a common abbreviation for memorandum in American English) and appending the suffix "-ization" to denote the process of recording or storing for future recall. Michie introduced it in the context of computational efficiency, emphasizing how it transforms recursive functions into more structured, reusable forms—much like how faceted classification in library science breaks down complex subjects into combinable attributes for precise retrieval. This origin underscores memoization's role in enhancing systems akin to hierarchical patent classifications, where avoiding redundant computations mirrors the efficient navigation from broad categories to specific innovations in frameworks like the International Patent Classification (IPC).

prior concept #5 -- Design Patterns in Programming and Architecture

Current Approaches to the Patent Search Problem

How is patent searching done now?

Overview of Standard Patent Classification Systems -- What existing systems are we trying to imporove

Patent classification systems serve as essential tools for organizing and retrieving patent documents by grouping inventions according to their technical domains. These systems facilitate efficient searching, examination, and analysis of intellectual property across jurisdictions. The primary purpose is to enable effective management of vast patent literatures, ensuring that similar technologies are clustered for comparison and prior art assessment. Among the most prominent systems is the International Patent Classification (IPC), a hierarchical framework established under the Strasbourg Agreement of 1971 and administered by the World Intellectual Property Organization (WIPO). The IPC divides technology into eight sections (A through H), further subdivided into classes, subclasses, groups, and subgroups, using a notation system like "A61K" for preparations for medical purposes. It is language-independent and applied globally to over 70 million patent documents, allowing for consistent classification regardless of the issuing authority. In the United States, the U.S. Patent Classification (USPC) has historically organized patents into approximately 470 main classes and over 160,000 subclasses, based on subject matter similarity. Although transitioning to the Cooperative Patent Classification (CPC) since 2013—a joint system with the European Patent Office (EPO) that builds on the IPC with enhanced detail—the USPC remains relevant for older documents and design/plant patents. The CPC refines the IPC by incorporating additional subdivisions, making it particularly useful for precise technology mapping in fields like biotechnology or electronics. Other notable systems include the European Classification (ECLA), which preceded the CPC and focused on EPO-specific needs, and national variants like Japan's File Index (FI) and F-terms for thematic searching. These classifications are typically hierarchical, enabling navigation from broad categories to specific innovations, though they face challenges in adapting to emerging technologies, often requiring periodic revisions. Overall, such systems underscore the need for structured organization in patent literature to support innovation and legal processes.

Digests of Patents

Types of Digests

A Dynamic Indexing system uses several diffent types of digests:

s-type digests i.e., Corpus Digests

These are the "source" digests. They constitute the corpus of documents that are or are to be indexed into the overall collection of facet digests.

d-type digests

These are the "detailed" digests. These d-type digests contain a set of documents that have been manually entered (tagged) into them.

b-type digests

These are the "broad" digests. The document content of a b-type digest is automatically determined by the documents of all the digests indented under it; this includes all the other b-type, d-type, h-type, and s-type digests that are indented below the given b-type. Each individual b-type digest constitutes its own page in a digest collection. Note that the first page in a digest collection is "page 0" and does not have a b-type digest associated with it, but all other pages in the collection are created from a b-type digest and list the digests indented under that digest.

h-type digests

These are the "holding" digests. The holding digests are the most algorithmically active digests and are unique to the "Dynamic Indexing" approach.

Here are the rules that govern the behavior of h-type digests. These rules are executed automatically by the system and there is no human intervention.

  • Every b-type digest has an associated h-type digest. When a b-type digest is created, its h-type digest is also automatically created.

  • A document can be tagged (entered) into either an h-type digest or the b-type digest associated therewith. If this document is not in any of the other digests indented under the b-type digest (i.e., it is not in the b-digest), it will be tagged (entered) into the h-type digest and into the associated b-type digest; otherwise, it will not be entered into the h-type digest and will stay in the b-type digest.

  • If a document is currently in an h-type digest, but that document is subsequently entered (tagged) into another digest indented under the b-type digest associated with that h-type digest, the document will be entered into that other digest and will be automatically removed from the h-type digest. It stays in the associated b-type digest.

In short, the h-type digests are "holding" digests that hold a document until that document is placed into another digest indented under the associated b-type digest.

The h-type digests allow for much more efficient tagging of the system documents. It's easy to recognize broad areas where a document belongs, but much more challenging to drill down to the detailed concepts involved with the d-type digests. The h-type digests allow a document to be "held" in a broad digest awaiting future consideration to place it in a more detailed location.

n-type digests

These are the "null" digests and do not contain any documents. They have a title and notes and are used to provide document-free information or simple formatting on a browser page with other digests.

The overall taxonomy structure -- how to organize Digests and why do it?

Given that there are several categories of digests, let's see how these digests can be organized into meaningful and useful patterns. Note that the Dynamic Indexing system was designed with patent documents in mind; the patterns below are based on concepts from technology, consistent with this patent-centric origin.

Abstraction Layers

  • Broader, more abstract at the top

Architectural Patterns

  • Hardware
  • Process
  • Document characteristics other than technical content
  • Document Sources

Technological Patterns or Templates

  • Overall Hardware System comprised of various interacting modules
  • Energy Modules
  • Material Supply Modules
  • Monitoring or Feedback Modules
  • Process Identifications
  • Before, During, and After Process steps
  • Environment
  • Purpose, teleological aspects

Put These all together -- Dynamic Indexing

Ok -- we've discussed how to create a Dynamic Indexing system. Once created, how do we use it?

Searching with a Dynamic Indexing taxonomy -- using the Power Stack

The power stack is a unique search tool for quickly inspecting the content of multiple patent digests.

Techniques

The workflow for Power Stack is

  • move digests into the Digest Pile
  • select Digest Pile digests to be included in the Basket
  • When the "Power Stack" button is clicked, all Basket digests will be loaded into the Power Stack
  • The ordering of the digests is significant. Order of digests can be changed on the Basket page. On the Basket page, got to "Manage". Use the priority changing features.

What does power stack mean?

Given a basket of patent digests, a power stack displays all the useable AND, NOT boolean combinations of the basket digests in a simple and easy to use format.

The term "power stack" is derived from the set theory concept of a power set combined with the word "stack" which connotes a prioritized grouping, i.e., a power set stacked in an orderly fashion.

Note that each member of the power set has a corresponding stack member. The stack member includes the power set subset digests PLUS the empty digests (a stack member is a unique AND / NOT combination over the basket digests)

Some theoretical aspects

In set theory, the power set is defined as the set of all the subsets of a set. The number of these subsets is 2n where n is the number of elements in the original set (the empty set is considered a subset). The power set of a set of patent digests thus has 2n elements where n is the number of digests and each element is a unique subset of the original set. The number of subsets of practical interest is 2n -1 since the empty set is of no interest in this situation.

Each of these subsets corresponds to a boolean search to yield a document set-- namely the digests present in the subset are 'AND-ed' together combined with a 'NOT' operation over the digests not in the subset to yield the documents unique to that subset. Thus a power stack display for 3 selected digests corresponds to 8-1 or 7 distinct searches. The power stack for 10 selected digests corresponds to over a thousand distinct searches.

The stacking aspect of a Power Stack is achieved by assigning a relative priority to each of the digests. Once this is done, each subset can be represented by a binary number. These binary numbers can be sorted, thus yielding a "stack" of power set subsets with the most important subsets at the top.

Notice that an individual document might be in multiple basket digests, but each individual document is in one and only one power stack pattern.

Other appoaches to patent searching --- Text Searching / Artificial Intelligence

Remember, the point here is to memoize conceptual understandings as they occur so that we do not have to repeat ourselves in a future task !