Blog Posts

Here you’ll find my thoughts on machine learning, statistics, and data science.

Hangman with DQN and Transformers

TL;DR. We train a small bidirectional Transformer + Double DQN to play Hangman, restricted to words ≤5 characters (training and inference) to keep the demo clear and controlled.

  • Two-stage: masked‑LM style pretraining (letter presence + next‑letter) → Q-learning with replay, target network, and Huber loss.
  • Action masking guarantees we never re‑guess letters; optional dictionary-aware pruning further narrows choices.
  • Guided exploration samples from the pretrained policy (temperature‑scaled) instead of uniform random; a small information‑gain reward encourages guesses that shrink the candidate set.
  • Curriculum (2–3 → 4 → 5 → mixed ≤5) stabilizes learning.

GMV Forecasting via xDeepFM

In this post, I aim to share how I conducted a proof of concept (PoC) to solve a real-world problem using deep learning techniques, emphasizing a clean code structure.

Decoding LoRA: A Comprehensive Summary on Low-Rank Adaptation

Recently, I came across an intriguing article on low-rank techniques employed in Large Language Models (LLM) specifically focusing on LoRA: Low-Rank Adaptation of Large Language Models. Here’s a succinct summary of the key concepts, along with additional discussions.

Comparing Sequence-to-Sequence Decoders: With and Without Attention

This post goes beyond a conventional code walkthrough inspired by this tutorial. My goal is to elevate the narrative by offering a comprehensive comparison between Seq2Seq (sequence-to-sequence) models with and without attention.

Note: Derivation of Normal Bayesian Test

The Normal Bayesian test is a statistical method used in hypothesis testing, particularly in the context of Bayesian statistics. It is applied to assess the validity of a null hypothesis ($H_0$) versus an alternative hypothesis ($H_1$) by considering the posterior distribution of a parameter of interest. This method is exemplified in Statistical Inference (2nd edition) by Casella and Berger, as shown on page 379.

Capturing Dominant Spatial Patterns with Two-Dimensional Locations Using SpatPCA

In this demonstration, we showcase how to utilize SpatPCA for analyzing two-dimensional data to capture the most dominant spatial pattern.

Apply SpatPCA to Capture the Dominant Spatial Pattern with One-Dimensional Locations

In this tutorial, we explore the application of SpatPCA to capture the most dominant spatial patterns in one-dimensional data, highlighting its performance under varying signal-to-noise ratios.

個人資料去識別化

在這提倡開放資料(open data)的時代,各機構從資料收集、維護到能夠公開分享,如何妥善的保護資料中的個人隱私是個廣泛討論的議題。例如,使用者希望政府資料能夠增加資料的透明程度以及增加新的資訊而增加資料的可用性,進而可以提供政府更有效的決策。為了確保隱私,避免暴露其身份,從圖1我們可以清楚了解資料收集個人資訊到最終分享資料的流程。當中,在公開資料前為了達到隱私保護的效果,必須對於資料執行適當保護措施 – 去識別化(de-identification)。去識別化是讓機構能夠從自己資料庫移除個人資訊的工具,使得其資料能二次使用或是分享給其他機構做學術或是商業相關的研究用途。

The Pulpit Rock
圖1. - 資料去識別化流程圖 (來源 Garfinkel[5]).



Challenges in EOF Patterns with a Single Variable

In this post, we delve into potential challenges associated with Empirical Orthogonal Function (EOF) patterns.

Exploring Dominant Spatial Patterns with a Single Variable

In this post, we delve into the formal exploration of dominant spatial patterns, particularly in the context of climate research.

How to Work on Sea Surface Temperature (SST) Data

In this post, I will show you step-by-step instructions to work on SST data in R.

Three Fundamental Aspects of Statistical Models

Recently, I delved into a classic Annals article, Additive Regression and Other Nonparametric Models” by Stones (1985). The insights gleaned from this piece revolve around three fundamental aspects of statistical models: