Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Search research articles

Related Experiment Video

Updated: May 24, 2026

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

Semi-supervised hashing for large-scale search.

Jun Wang¹, Sanjiv Kumar, Shih-Fu Chang

¹Business Analytics and Mathematical Sciences Department, IBM T.J. Watson Research Center, RM 31-229, 1101 Kitchawan Rd, Rte. 134, Yorktown Heights, NY 10598, USA. wangjun@us.ibm.com

IEEE Transactions on Pattern Analysis and Machine Intelligence

|February 15, 2012

Summary

This summary is machine-generated.

Related Concept Videos

Cluster Sampling Method

Cluster Sampling Method

Appropriate sampling methods ensure that samples are drawn without bias and accurately represent the population. Because measuring the entire population in a study is not practical, researchers use samples to represent the population of interest.
To choose a cluster sample, divide the population into clusters (groups) and then randomly select some of the clusters. All the members from these clusters are in the cluster sample. For example, if you randomly sample four departments from your...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Intravenous transplantation of mesenchymal stem cells improves cardiac performance after acute myocardial ischemia in female rats.

Transplant international : official journal of the European Society for Organ Transplantation·2006

Same author

[Effects of mechanical tensile stress on the expression of ICAM-1 mRNA in osteoblasts differentiated from rBMSCs].

Sichuan da xue xue bao. Yi xue ban = Journal of Sichuan University. Medical science edition·2006

Same author

[Effects of osteoporosis on experimental tooth movement in aged rats].

Sichuan da xue xue bao. Yi xue ban = Journal of Sichuan University. Medical science edition·2006

Same author

MCALIGN2: faster, accurate global pairwise alignment of non-coding DNA sequences based on explicit models of indel evolution.

BMC bioinformatics·2006

Same author

[Managements of masked mastoiditis].

Zhonghua er bi yan hou tou jing wai ke za zhi = Chinese journal of otorhinolaryngology head and neck surgery·2006

Same author

Neuronal SIRT1 activation as a novel mechanism underlying the prevention of Alzheimer disease amyloid neuropathology by calorie restriction.

The Journal of biological chemistry·2006

Same journal

TraGraph-GS: Trajectory Graph-based Gaussian Splatting for Arbitrary Large-Scale Scene Rendering.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

SWIFT: A Small-World Interaction Framework for Flow-Aware Trajectory Prediction in Autonomous Driving.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

HardFlow: Hard-Constrained Sampling for Flow-Matching Models Via Trajectory Optimization.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

Industrial Brain: Self-Evolving Neuro-Symbolic Autonomy with Causal Resilience for Cyber-Physical Systems.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

Adaptive Hardness-Driven Dictionary Distillation for Incomplete Streaming View Clustering.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

Mixture of Global and Local Experts with Diffusion Transformer for Controllable Face Generation.

IEEE transactions on pattern analysis and machine intelligence·2026

See all related articles

This study introduces a novel semi-supervised hashing framework for efficient approximate nearest neighbor search. The proposed methods significantly improve accuracy and robustness in large-scale datasets compared to existing techniques.

Area of Science:

Computer Science
Machine Learning
Data Mining

Background:

Hashing-based approximate nearest neighbor (ANN) search is crucial for large databases.
Existing methods like Locality Sensitive Hashing and Spectral Hashing have limitations in accuracy and efficiency.
Supervised hashing methods struggle with small or noisy labeled data, leading to overfitting.

Purpose of the Study:

To propose a novel semi-supervised hashing (SSH) framework for improved ANN search.
To develop robust hashing methods that leverage both labeled and unlabeled data.
To extend the hashing paradigm to unsupervised domains.

Main Methods:

Developed an SSH framework minimizing empirical error on labeled data and using an information-theoretic regularizer.

Related Experiment Videos

Last Updated: May 24, 2026

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

Introduced three SSH methods: orthogonal, nonorthogonal, and sequential hashing.

Demonstrated sequential hashing's error-correction capabilities and its extension to unsupervised learning.

Main Results:

The proposed SSH methods outperform state-of-the-art supervised and unsupervised hashing techniques.
Sequential hashing generates particularly robust codes by correcting previous errors.
Experiments on datasets up to 80 million samples validate the superior performance.

Conclusions:

The novel SSH framework offers a significant advancement in approximate nearest neighbor search.
Semi-supervised and unsupervised hashing methods can effectively handle large-scale, complex data.
The proposed sequential learning paradigm provides a robust approach to hashing.