Search research articles

お問い合わせ

JoVEについて

概要リーダーシップブログ JoVEヘルプセンター

著者向け

出版プロセス編集委員会範囲と方針査読よくある質問投稿

図書館員向け

推薦の声購読アクセスリソース図書館諮問委員会よくある質問

研究

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments アーカイブ

教育

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual 教員リソースセンター教員サイト

プライバシーポリシー

関連する概念動画

Estimating Population Standard Deviation

Estimating Population Standard Deviation

When the population standard deviation is unknown and the sample size is large, the sample standard deviation s is commonly used as a point estimate of σ. However, it can sometimes under or overestimate the population standard deviation. To overcome this drawback, confidence intervals are determined to estimate population parameters and eliminate any calculation bias accurately. However, this only applies to random samples from normally distributed populations. Knowing the sample mean and...

Estimating Population Mean with Unknown Standard Deviation

Estimating Population Mean with Unknown Standard Deviation

In practice, we rarely know the population standard deviation. In the past, when the sample size was large, this did not present a problem to statisticians. They used the sample standard deviation s as an estimate for σ and proceeded as before to calculate a confidence interval with close enough results. However, statisticians ran into problems when the sample size was small. A small sample size caused inaccuracies in the confidence interval.
William S. Gosset (1876–1937) of the...

What are Estimates?

What are Estimates?

It isn't easy to measure a parameter such as the mean height or the mean weight of a population. So, we draw samples from the population and calculate the mean height or mean weight of the individuals in the sample. This sample data acts as a representative measure of the population parameter. These sample statistics are known as estimates.
The estimate for the mean of a sample is denoted by ͞x, whereas the mean of the population is designated as μ. Further, parameters such...

Estimating Population Mean with Known Standard Deviation

Estimating Population Mean with Known Standard Deviation

To construct a confidence interval for a single unknown population mean μ, where the population standard deviation is known, we need sample mean as an estimate for μ and we need the margin of error. Here, the margin of error (EBM) is called the error bound for a population mean (abbreviated EBM). The sample mean is the point estimate of the unknown population mean μ.
The confidence interval estimate will have the form as follows:
(point estimate - error bound, point estimate +...

Statistical Significance

Statistical Significance

Once data is collected from both the experimental and the control groups, a statistical analysis is conducted to find out if there are meaningful differences between the two groups. A statistical analysis determines how likely any difference found is due to chance (and thus not meaningful). In psychology, group differences are considered meaningful, or significant, if the odds that these differences occurred by chance alone are 5 percent or less. Stated another way, if we repeated this...

Empirical Method to Interpret Standard Deviation

Empirical Method to Interpret Standard Deviation

The empirical rule, also known as the three-sigma rule, allows a statistician to interpret the standard deviation in a normally distributed dataset. The rule states that 68% of the data lies within one standard deviation from the mean, 95% lies within two standard deviations from the mean, and 99.7% lies within three standard deviations from the mean. Additionally, this rule is also called the 68-95-99.7 rule.
This rule is used widely in statistics to calculate the proportion of data values...

こちらも読む

関連記事

共著者、ジャーナル、引用グラフによってこの研究に関連する記事。

並び替え

Same author

Optimization of Fe(III)-based negative electrodes for lithium-ion batteries: probing electrochemical performance and stability characteristics.

Dalton transactions (Cambridge, England : 2003)·2026

Same author

The IMPACT epilepsy Consortium: Exploring social drivers of health in epilepsy care to advance solution based initiatives.

Epilepsy & behavior : E&B·2026

Same author

Naturalistic Driving Outcomes and Sensorimotor Function in Cognitively Normal Older Adults.

Journal of the American Geriatrics Society·2026

Same author

Multivariate and Online Transfer Learning With Uncertainty Quantification.

Statistics in medicine·2026

Same author

Redox-Active Bis-Catecholaldimine Cu(II)-Salen Complex with Hydroxyl Functionality as Cathode Material in Li-Ion Battery.

ChemPlusChem·2026

Same author

A Minimalist Iron Porphyrin Which Can Catalyze Both Peroxidation and Oxygen Reduction Reaction.

JACS Au·2025

Same journal

Regression Trees and Ensemble for Multivariate Outcomes.

Sankhya. Series B. [Methodological.]·2025

Same journal

Cluster Based Association Measures with Applications.

Sankhya. Series B. [Methodological.]·2025

Same journal

Mediation Analysis using Semi-parametric Shape-Restricted Regression with Applications.

Sankhya. Series B. [Methodological.]·2024

Same journal

A Blockwise Consistency Method for Parameter Estimation of Complex Models.

Sankhya. Series B. [Methodological.]·2021

Same journal

Local linear estimation for spatial random processes with stochastic trend and stationary noise.

Sankhya. Series B. [Methodological.]·2019

Same journal

NONPARAMETRIC BENCHMARK ANALYSIS IN RISK ASSESSMENT: A COMPARATIVE STUDY BY SIMULATION AND DATA ANALYSIS.

Sankhya. Series B. [Methodological.]·2013

関連記事をすべて見る

Search research articles

ホーム
単語埋め込み子統計推定値

ホーム
単語埋め込み子統計推定値

関連する実験動画

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

単語埋め込み子統計推定値

Neil Dey¹, Matthew Singer¹, Jonathan P Williams²

¹Department of Statistics, North Carolina State University.

Sankhya. Series B. [Methodological.]

|December 19, 2025

PubMed で要約を見る

まとめ

この要約は機械生成です。

この研究は、点相互情報量（PMI）を通じたWord2Vecの解釈を提供する単語埋め込みの統計的フレームワークを導入しています。新しい欠損値推定値は、Word2Vecと同等の性能を持つ統計的に健全な代替手段を提供します。

キーワード:

単語埋め込み統計的推定点相互情報量 Word2Vec 自然言語処理機械学習

さらに関連する動画

Decoding Natural Behavior from Neuroethological Embedding

Decoding Natural Behavior from Neuroethological Embedding

Published on: October 3, 2025

関連する実験動画

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

Decoding Natural Behavior from Neuroethological Embedding

Decoding Natural Behavior from Neuroethological Embedding

Published on: October 3, 2025

科学分野:

自然言語処理
統計理論
機械学習

背景:

単語埋め込みはNLPにおいて重要ですが、理論的な理解が不足しています。
現在の評価は、厳密な特性ではなく、経験的なパフォーマンスに依存しています。
形式的な推論と不確実性の定量化には、理論的な基盤が必要です。

研究の目的:

単語埋め込みの統計的理論的視点を提供すること。
古典的な方法、例えばWord2Vecを形式的な統計モデル内で解釈すること。
既存の単語埋め込み技術に代わる、統計的に扱いやすい新しいものを開発すること。

主な方法:

テキストデータのためのコピュラベースの統計モデルを提案しました。
Word2Vecを理論的な点相互情報量（PMI）の推定値として解釈しました。
以前の研究に基づいて、欠損値ベースの推定値を開発しました。

主要な成果:

Word2Vecと理論的PMIの推定との関連を実証しました。
提案された欠損値推定値は、Word2Vecと同等の推定誤差を示します。
新しい推定値は、切り捨てベースの方法よりも優れた性能を発揮します。

IMDb感情分析タスクでWord2Vecと同等の性能を達成しました。

結論:

コピュラベースのモデルは、単語埋め込みの理論的な基盤を提供します。
欠損値推定値は、統計的に解釈可能で効果的な代替手段を提供します。
この研究は、単語埋め込みにおける経験的な成功と理論的な理解との間のギャップを埋めます。